Re: Spark-CSV - Zeppelin tries to read CSV locally in Standalon mode

2017-05-10 Thread Sofiane Cherchalli
I've put the csv in the worker node since the job is run in the worker. I
didn't put the csv in the master because I believe it doesn't run jobs.

If I put the csv in the zeppelin node with the same path as the worker, it
reads the csv and writes a _SUCCESS file locally. The job is run on the
worker too but doesn't terminate. The result is saved under a _temporary
directory in the worker.

worker - ls -laRt /data/02.csv/


02.csv/:
total 0
drwxr-xr-x. 3 root root 24 Apr 28 09:55 .
drwxr-xr-x. 3 root root 15 Apr 28 09:55 _temporary
drwxr-xr-x. 3 root root 64 Apr 28 09:55 ..

02.csv/_temporary:
total 0
drwxr-xr-x. 5 root root 106 Apr 28 09:56 0
drwxr-xr-x. 3 root root  15 Apr 28 09:55 .
drwxr-xr-x. 3 root root  24 Apr 28 09:55 ..

02.csv/_temporary/0:
total 0
drwxr-xr-x. 5 root root 106 Apr 28 09:56 .
drwxr-xr-x. 2 root root   6 Apr 28 09:56 _temporary
drwxr-xr-x. 2 root root 129 Apr 28 09:56 task_20170428095632_0005_m_00
drwxr-xr-x. 2 root root 129 Apr 28 09:55 task_20170428095516_0002_m_00
drwxr-xr-x. 3 root root  15 Apr 28 09:55 ..

02.csv/_temporary/0/_temporary:
total 0
drwxr-xr-x. 2 root root   6 Apr 28 09:56 .
drwxr-xr-x. 5 root root 106 Apr 28 09:56 ..

02.csv/_temporary/0/task_20170428095632_0005_m_00:
total 52
drwxr-xr-x. 5 root root   106 Apr 28 09:56 ..
-rw-r--r--. 1 root root   376 Apr 28 09:56
.part-0-e39ebc76-5343-407e-b42e-c33e69b8fd1a.csv.crc
-rw-r--r--. 1 root root 46605 Apr 28 09:56
part-0-e39ebc76-5343-407e-b42e-c33e69b8fd1a.csv
drwxr-xr-x. 2 root root   129 Apr 28 09:56 .

02.csv/_temporary/0/task_20170428095516_0002_m_00:
total 52
drwxr-xr-x. 5 root root   106 Apr 28 09:56 ..
-rw-r--r--. 1 root root   376 Apr 28 09:55
.part-0-c2ac5299-26f6-4b23-a74b-b3dc96464271.csv.crc
-rw-r--r--. 1 root root 46605 Apr 28 09:55
part-0-c2ac5299-26f6-4b23-a74b-b3dc96464271.csv


zeppelin - ls -laRt 02.csv/


02.csv/:
total 12
drwxr-sr-x2 root 1700  4096 Apr 28 09:56 .
-rw-r--r--1 root 1700 8 Apr 28 09:56 ._SUCCESS.crc
-rw-r--r--1 root 1700 0 Apr 28 09:56 _SUCCESS
drwxrwsr-x5 root 1700  4096 Apr 28 09:56 ..




El El mié, 10 may 2017 a las 14:06, Meethu Mathew 
escribió:

> Try putting the csv in the same path in all the nodes or in a mount point
> path which is accessible by all the nodes
>
> Regards,
>
>
> Meethu Mathew
>
>
> On Wed, May 10, 2017 at 3:36 PM, Sofiane Cherchalli 
> wrote:
>
>> Yes, I already tested with spark-shell and pyspark , with the same result.
>>
>> Can't I use Linux filesystem to read CSV, such as file:///data/file.csv.
>> My understanding is that the job is sent and is interpreted in the worker,
>> isn't it?
>>
>> Thanks.
>>
>> El El mar, 9 may 2017 a las 20:23, Jongyoul Lee 
>> escribió:
>>
>>> Could you test if it works with spark-shell?
>>>
>>> On Sun, May 7, 2017 at 5:22 PM, Sofiane Cherchalli 
>>> wrote:
>>>
 Hi,

 I have a standalone cluster, one master and one worker, running in
 separate nodes. Zeppelin is running is in a separate node too in client
 mode.

 When I run a notebook that reads a CSV file located in the worker
 node with Spark-CSV package, Zeppelin tries to read the CSV locally and
 fails because the CVS is in the worker node and not in Zeppelin node.

 Is this the expected behavior?

 Thanks.

>>>
>>>
>>>
>>> --
>>> 이종열, Jongyoul Lee, 李宗烈
>>> http://madeng.net
>>>
>>
>


Re: Spark-CSV - Zeppelin tries to read CSV locally in Standalon mode

2017-05-09 Thread Jongyoul Lee
Could you test if it works with spark-shell?

On Sun, May 7, 2017 at 5:22 PM, Sofiane Cherchalli 
wrote:

> Hi,
>
> I have a standalone cluster, one master and one worker, running in
> separate nodes. Zeppelin is running is in a separate node too in client
> mode.
>
> When I run a notebook that reads a CSV file located in the worker
> node with Spark-CSV package, Zeppelin tries to read the CSV locally and
> fails because the CVS is in the worker node and not in Zeppelin node.
>
> Is this the expected behavior?
>
> Thanks.
>



-- 
이종열, Jongyoul Lee, 李宗烈
http://madeng.net