12
>
> *To: *Ewan Leith;
>
> *Cc: *user@spark.apache.org;
>
> *Subject:*Re: Problem while loading saved data
>
>
> Hi Ewan,
>
> Yes, 'people.parquet' is from the first attempt and in that attempt it
> tried to save the same people.json.
>
> It seems that the sam
ect:Re: Problem while loading saved data
Hi Ewan,
Yes, 'people.parquet' is from the first attempt and in that attempt it tried to
save the same people.json.
It seems that the same folder is created on both the nodes and contents of the
files are distributed between the two servers.
On the maste
normally be created when the write completes, can you show us your write output?
Thanks,
Ewan
From: Amila De Silva [mailto:jaa...@gmail.com]
Sent: 03 September 2015 05:44
To: Guru Medasani <gdm...@gmail.com>
Cc: user@spark.apache.org
Subject: Re: Problem while loading saved data
H
Hi All,
I have a two node spark cluster, to which I'm connecting using IPython
notebook.
To see how data saving/loading works, I simply created a dataframe using
people.json using the Code below;
df = sqlContext.read.json("examples/src/main/resources/people.json")
Then called the following to
Hi Amila,
Error says that the ‘people.parquet’ file does not exist. Can you manually
check to see if that file exists?
> Py4JJavaError: An error occurred while calling o53840.parquet.
> : java.lang.AssertionError: assertion failed: No schema defined, and no
> Parquet data file or summary file
Hi Guru,
Thanks for the reply.
Yes, I checked if the file exists. But instead of a single file what I
found was a directory having the following structure.
people.parquet
└── _temporary
└── 0
├── task_201509030057_4699_m_00
│ └──