from:"ey\-chih chow"

Contextual bandits

2018-03-09 Thread ey-chih chow

Hi,

Does Spark MLLIB support Contextual Bandit?  How can we use Spark MLLIB to
implement Contextual Bandit?

Thanks.

Best regards,

Ey-Chih



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

corresponding sql for query against LocalRelation

2016-01-27 Thread ey-chih chow

Hi,

For a query against the LocalRelation, is there anybody know what does the
corresponding SQL looks like?  Thanks.

Best regards,

Ey-Chih Chow



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/corresponding-sql-for-query-against-LocalRelation-tp26093.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

IncompatibleClassChangeError

2015-03-05 Thread ey-chih chow

Hi,

I am using CDH5.3.2 now for a Spark project.  I got the following exception:

java.lang.IncompatibleClassChangeError: Found interface
org.apache.hadoop.mapreduce.TaskAttemptContext, but class was expected

I used all the CDH5.3.2 jar files in my pom file to generate the application
jar file.  What else I should do to fix the problem?

Ey-Chih Chow



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/IncompatibleClassChangeError-tp21934.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

how to improve performance of spark job with large input to executor?

2015-02-27 Thread ey-chih chow

Hi,

I ran a spark job.  Each executor is allocated a chuck of input data.  For
the executor with a small chunk of input data, the performance is reasonable
good.  But for the executor with a large chunk of input data, the
performance is not good.  How can I tune Spark configuration parameters to
have better performance for large input data?  Thanks.


Ey-Chih Chow 



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/how-to-improve-performance-of-spark-job-with-large-input-to-executor-tp21856.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

RE: no space left at worker node

2015-02-09 Thread ey-chih chow

In other words, the working command is:
/root/spark/bin/spark-submit --class com.crowdstar.etl.ParseAndClean --master 
spark://ec2-54-213-73-150.us-west-2.compute.amazonaws.com:7077 --deploy-mode 
cluster --total-executor-cores 4 
file:///root/etl-admin/jar/spark-etl-0.0.1-SNAPSHOT.jar 
s3://pixlogstxt/ETL/input/2015/01/28/09/ 
file:///root/etl-admin/vertica/VERTICA.avdl 
file:///root/etl-admin/vertica/extras.json 
file:///root/etl-admin/jar/spark-etl-0.0.1-SNAPSHOT.jar 
s3://pixlogstxt/ETL/output/ -1
How can I change it to avoid copying the jar file to ./spark/work/app- ?
Ey-Chih ChowFrom: eyc...@hotmail.com
To: 2dot7kel...@gmail.com
CC: gen.tan...@gmail.com; user@spark.apache.org
Subject: RE: no space left at worker node
Date: Mon, 9 Feb 2015 10:59:00 -0800




Thanks.  But, in spark-submit, I specified the jar file in the form of 
local:/spark-etl-0.0.1-SNAPSHOT.jar.  It comes back with the following.  What's 
wrong with this?  
Ey-Chih Chow

===

Date: Sun, 8 Feb 2015 22:27:17 -0800Sending launch command to 
spark://ec2-54-213-73-150.us-west-2.compute.amazonaws.com:7077Driver 
successfully submitted as driver-20150209185453-0010... waiting before polling 
master for driver state... polling master for driver stateState of 
driver-20150209185453-0010 is ERRORException from cluster was: 
java.io.IOException: No FileSystem for scheme: localjava.io.IOException: No 
FileSystem for scheme: localat 
org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1383)   at 
org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:66)   at 
org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1404)  at 
org.apache.hadoop.fs.FileSystem.get(FileSystem.java:254) at 
org.apache.hadoop.fs.Path.getFileSystem(Path.java:187)   at 
org.apache.spark.deploy.worker.DriverRunner.org$apache$spark$deploy$worker$DriverRunner$$downloadUserJar(DriverRunner.scala:148)
 at 
org.apache.spark.deploy.worker.DriverRunner$$anon$1.run(DriverRunner.scala:74)



Subject: Re: no space left at worker node
From: 2dot7kel...@gmail.com
To: eyc...@hotmail.com
CC: gen.tan...@gmail.com; user@spark.apache.org

Maybe, try with local: under the heading of Advanced Dependency Management 
here: https://spark.apache.org/docs/1.1.0/submitting-applications.html
It seems this is what you want. Hope this help.
Kelvin
On Sun, Feb 8, 2015 at 9:13 PM, ey-chih chow eyc...@hotmail.com wrote:



Is there any way we can disable Spark copying the jar file to the corresponding 
directory.  I have a fat jar and is already copied to worker nodes using the 
command copydir.  Why Spark needs to save the jar to ./spark/work/appid each 
time a job get started?
Ey-Chih Chow 

Date: Sun, 8 Feb 2015 20:09:32 -0800
Subject: Re: no space left at worker node
From: 2dot7kel...@gmail.com
To: eyc...@hotmail.com
CC: gen.tan...@gmail.com; user@spark.apache.org

I guess you may set the parameters below to clean the directories:
spark.worker.cleanup.enabledspark.worker.cleanup.intervalspark.worker.cleanup.appDataTtl
They are described here: 
http://spark.apache.org/docs/1.2.0/spark-standalone.html
Kelvin
On Sun, Feb 8, 2015 at 5:15 PM, ey-chih chow eyc...@hotmail.com wrote:



I found the problem is, for each application, the Spark worker node saves the 
corresponding std output and std err under ./spark/work/appid, where appid is 
the id of the application.  If I ran several applications in a row, it will out 
of space.  In my case, the disk usage under ./spark/work/ is as follows:
1689784 ./app-20150208203033-0002/01689788  ./app-20150208203033-000240324  
./driver-20150208180505-00011691400 ./app-20150208180509-0001/01691404  
./app-20150208180509-000140316  ./driver-20150208203030-000240320   
./driver-20150208173156-1649876 ./app-20150208173200-/01649880  
./app-20150208173200-5152036.
Any suggestion how to resolve it?  Thanks.
Ey-Chih ChowFrom: eyc...@hotmail.com
To: gen.tan...@gmail.com
CC: user@spark.apache.org
Subject: RE: no space left at worker node
Date: Sun, 8 Feb 2015 15:25:43 -0800




By this way, the input and output paths of the job are all in s3.  I did not 
use paths of hdfs as input or output.
Best regards,
Ey-Chih Chow

From: eyc...@hotmail.com
To: gen.tan...@gmail.com
CC: user@spark.apache.org
Subject: RE: no space left at worker node
Date: Sun, 8 Feb 2015 14:57:15 -0800




Hi Gen,
Thanks.  I save my logs in a file under /var/log.  This is the only place to 
save data.  Will the problem go away if I use a better machine?
Best regards,
Ey-Chih Chow

Date: Sun, 8 Feb 2015 23:32:27 +0100
Subject: Re: no space left at worker node
From: gen.tan...@gmail.com
To: eyc...@hotmail.com
CC: user@spark.apache.org

Hi,
I am sorry that I made a mistake. r3.large has only one SSD which has been 
mounted in /mnt. Therefore this is no /dev/sdc.In fact, the problem is that 
there is no space in the under / directory. So you should check whether your 
application write

RE: no space left at worker node

2015-02-09 Thread ey-chih chow

Thanks.  But, in spark-submit, I specified the jar file in the form of 
local:/spark-etl-0.0.1-SNAPSHOT.jar.  It comes back with the following.  What's 
wrong with this?  
Ey-Chih Chow

===

Date: Sun, 8 Feb 2015 22:27:17 -0800Sending launch command to 
spark://ec2-54-213-73-150.us-west-2.compute.amazonaws.com:7077Driver 
successfully submitted as driver-20150209185453-0010... waiting before polling 
master for driver state... polling master for driver stateState of 
driver-20150209185453-0010 is ERRORException from cluster was: 
java.io.IOException: No FileSystem for scheme: localjava.io.IOException: No 
FileSystem for scheme: localat 
org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1383)   at 
org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:66)   at 
org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1404)  at 
org.apache.hadoop.fs.FileSystem.get(FileSystem.java:254) at 
org.apache.hadoop.fs.Path.getFileSystem(Path.java:187)   at 
org.apache.spark.deploy.worker.DriverRunner.org$apache$spark$deploy$worker$DriverRunner$$downloadUserJar(DriverRunner.scala:148)
 at 
org.apache.spark.deploy.worker.DriverRunner$$anon$1.run(DriverRunner.scala:74)



Subject: Re: no space left at worker node
From: 2dot7kel...@gmail.com
To: eyc...@hotmail.com
CC: gen.tan...@gmail.com; user@spark.apache.org

Maybe, try with local: under the heading of Advanced Dependency Management 
here: https://spark.apache.org/docs/1.1.0/submitting-applications.html
It seems this is what you want. Hope this help.
Kelvin
On Sun, Feb 8, 2015 at 9:13 PM, ey-chih chow eyc...@hotmail.com wrote:



Is there any way we can disable Spark copying the jar file to the corresponding 
directory.  I have a fat jar and is already copied to worker nodes using the 
command copydir.  Why Spark needs to save the jar to ./spark/work/appid each 
time a job get started?
Ey-Chih Chow 

Date: Sun, 8 Feb 2015 20:09:32 -0800
Subject: Re: no space left at worker node
From: 2dot7kel...@gmail.com
To: eyc...@hotmail.com
CC: gen.tan...@gmail.com; user@spark.apache.org

I guess you may set the parameters below to clean the directories:
spark.worker.cleanup.enabledspark.worker.cleanup.intervalspark.worker.cleanup.appDataTtl
They are described here: 
http://spark.apache.org/docs/1.2.0/spark-standalone.html
Kelvin
On Sun, Feb 8, 2015 at 5:15 PM, ey-chih chow eyc...@hotmail.com wrote:



I found the problem is, for each application, the Spark worker node saves the 
corresponding std output and std err under ./spark/work/appid, where appid is 
the id of the application.  If I ran several applications in a row, it will out 
of space.  In my case, the disk usage under ./spark/work/ is as follows:
1689784 ./app-20150208203033-0002/01689788  ./app-20150208203033-000240324  
./driver-20150208180505-00011691400 ./app-20150208180509-0001/01691404  
./app-20150208180509-000140316  ./driver-20150208203030-000240320   
./driver-20150208173156-1649876 ./app-20150208173200-/01649880  
./app-20150208173200-5152036.
Any suggestion how to resolve it?  Thanks.
Ey-Chih ChowFrom: eyc...@hotmail.com
To: gen.tan...@gmail.com
CC: user@spark.apache.org
Subject: RE: no space left at worker node
Date: Sun, 8 Feb 2015 15:25:43 -0800




By this way, the input and output paths of the job are all in s3.  I did not 
use paths of hdfs as input or output.
Best regards,
Ey-Chih Chow

From: eyc...@hotmail.com
To: gen.tan...@gmail.com
CC: user@spark.apache.org
Subject: RE: no space left at worker node
Date: Sun, 8 Feb 2015 14:57:15 -0800




Hi Gen,
Thanks.  I save my logs in a file under /var/log.  This is the only place to 
save data.  Will the problem go away if I use a better machine?
Best regards,
Ey-Chih Chow

Date: Sun, 8 Feb 2015 23:32:27 +0100
Subject: Re: no space left at worker node
From: gen.tan...@gmail.com
To: eyc...@hotmail.com
CC: user@spark.apache.org

Hi,
I am sorry that I made a mistake. r3.large has only one SSD which has been 
mounted in /mnt. Therefore this is no /dev/sdc.In fact, the problem is that 
there is no space in the under / directory. So you should check whether your 
application write data under this directory(for instance, save file in 
file:///). 
If not, you can use watch du -sh to during the running time to figure out which 
directory is expanding. Normally, only /mnt directory which is supported by SSD 
is expanding significantly, because the data of hdfs is saved here. Then you 
can find the directory which caused no space problem and find out the specific 
reason.
CheersGen


On Sun, Feb 8, 2015 at 10:45 PM, ey-chih chow eyc...@hotmail.com wrote:



Thanks Gen.  How can I check if /dev/sdc is well mounted or not?  In general, 
the problem shows up when I submit the second or third job.  The first job I 
submit most likely will succeed.
Ey-Chih Chow

Date: Sun, 8 Feb 2015 18:18:03 +0100
Subject: Re: no space left at worker node
From: gen.tan

RE: no space left at worker node

2015-02-08 Thread ey-chih chow

Gen,
Thanks for your information.  The content of /etc/fstab at the worker node 
(r3.large) is:
#LABEL=/ /   ext4defaults,noatime  1   1tmpfs   /dev/shm
tmpfs   defaults0   0devpts  /dev/ptsdevpts  gid=5,mode=620  0  
 0sysfs   /syssysfs   defaults0   0proc/proc   
procdefaults0   0/dev/sdb/mntauto
defaults,noatime,nodiratime,comment=cloudconfig 0   0/dev/sdc/mnt2  
 autodefaults,noatime,nodiratime,comment=cloudconfig 0   0
There is no entry of /dev/xvdb.
 Ey-Chih Chow
Date: Sun, 8 Feb 2015 12:09:37 +0100
Subject: Re: no space left at worker node
From: gen.tan...@gmail.com
To: eyc...@hotmail.com
CC: user@spark.apache.org

Hi,
I fact, I met this problem before. it is a bug of AWS. Which type of machine do 
you use?
If I guess well, you can check the file /etc/fstab. There would be a double 
mount of /dev/xvdb.If yes, you should1. stop hdfs2. umount /dev/xvdb at / 3. 
restart hdfs
Hope this could be helpful.CheersGen


On Sun, Feb 8, 2015 at 8:16 AM, ey-chih chow eyc...@hotmail.com wrote:
Hi,



I submitted a spark job to an ec2 cluster, using spark-submit.  At a worker

node, there is an exception of 'no space left on device' as follows.



==

15/02/08 01:53:38 ERROR logging.FileAppender: Error writing stream to file

/root/spark/work/app-20150208014557-0003/0/stdout

java.io.IOException: No space left on device

at java.io.FileOutputStream.writeBytes(Native Method)

at java.io.FileOutputStream.write(FileOutputStream.java:345)

at

org.apache.spark.util.logging.FileAppender.appendToFile(FileAppender.scala:92)

at

org.apache.spark.util.logging.FileAppender.appendStreamToFile(FileAppender.scala:72)

at

org.apache.spark.util.logging.FileAppender$$anon$1$$anonfun$run$1.apply$mcV$sp(FileAppender.scala:39)

at

org.apache.spark.util.logging.FileAppender$$anon$1$$anonfun$run$1.apply(FileAppender.scala:39)

at

org.apache.spark.util.logging.FileAppender$$anon$1$$anonfun$run$1.apply(FileAppender.scala:39)

at

org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1311)

at

org.apache.spark.util.logging.FileAppender$$anon$1.run(FileAppender.scala:38)

===



The command df showed the following information at the worker node:



Filesystem   1K-blocks  Used Available Use% Mounted on

/dev/xvda1 8256920   8256456 0 100% /

tmpfs  7752012 0   7752012   0% /dev/shm

/dev/xvdb 30963708   1729652  27661192   6% /mnt



Does anybody know how to fix this?  Thanks.





Ey-Chih Chow







--

View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/no-space-left-at-worker-node-tp21545.html

Sent from the Apache Spark User List mailing list archive at Nabble.com.



-

To unsubscribe, e-mail: user-unsubscr...@spark.apache.org

For additional commands, e-mail: user-h...@spark.apache.org

RE: no space left at worker node

2015-02-08 Thread ey-chih chow

Thanks Gen.  How can I check if /dev/sdc is well mounted or not?  In general, 
the problem shows up when I submit the second or third job.  The first job I 
submit most likely will succeed.
Ey-Chih Chow

Date: Sun, 8 Feb 2015 18:18:03 +0100
Subject: Re: no space left at worker node
From: gen.tan...@gmail.com
To: eyc...@hotmail.com
CC: user@spark.apache.org

Hi,
In fact, /dev/sdb is /dev/xvdb. It seems that there is no problem about double 
mount. However, there is no information about /mnt2. You should check whether 
/dev/sdc is well mounted or not.The reply of Micheal is good solution about 
this type of problem. You can check his site.
CheersGen

On Sun, Feb 8, 2015 at 5:53 PM, ey-chih chow eyc...@hotmail.com wrote:

Gen,
Thanks for your information.  The content of /etc/fstab at the worker node 
(r3.large) is:
#LABEL=/ /   ext4defaults,noatime  1   1tmpfs   /dev/shm
tmpfs   defaults0   0devpts  /dev/ptsdevpts  gid=5,mode=620  0  
 0sysfs   /syssysfs   defaults0   0proc/proc   
procdefaults0   0/dev/sdb/mntauto
defaults,noatime,nodiratime,comment=cloudconfig 0   0/dev/sdc/mnt2  
 autodefaults,noatime,nodiratime,comment=cloudconfig 0   0
There is no entry of /dev/xvdb.
 Ey-Chih Chow
Date: Sun, 8 Feb 2015 12:09:37 +0100
Subject: Re: no space left at worker node
From: gen.tan...@gmail.com
To: eyc...@hotmail.com
CC: user@spark.apache.org

Hi,
I fact, I met this problem before. it is a bug of AWS. Which type of machine do 
you use?
If I guess well, you can check the file /etc/fstab. There would be a double 
mount of /dev/xvdb.If yes, you should1. stop hdfs2. umount /dev/xvdb at / 3. 
restart hdfs
Hope this could be helpful.CheersGen

On Sun, Feb 8, 2015 at 8:16 AM, ey-chih chow eyc...@hotmail.com wrote:
Hi,

I submitted a spark job to an ec2 cluster, using spark-submit.  At a worker

node, there is an exception of 'no space left on device' as follows.

==

15/02/08 01:53:38 ERROR logging.FileAppender: Error writing stream to file

/root/spark/work/app-20150208014557-0003/0/stdout

java.io.IOException: No space left on device

at java.io.FileOutputStream.writeBytes(Native Method)

at java.io.FileOutputStream.write(FileOutputStream.java:345)

at

org.apache.spark.util.logging.FileAppender.appendToFile(FileAppender.scala:92)

at

org.apache.spark.util.logging.FileAppender.appendStreamToFile(FileAppender.scala:72)

at

org.apache.spark.util.logging.FileAppender$$anon$1$$anonfun$run$1.apply$mcV$sp(FileAppender.scala:39)

at

org.apache.spark.util.logging.FileAppender$$anon$1$$anonfun$run$1.apply(FileAppender.scala:39)

at

org.apache.spark.util.logging.FileAppender$$anon$1$$anonfun$run$1.apply(FileAppender.scala:39)

at

org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1311)

at

org.apache.spark.util.logging.FileAppender$$anon$1.run(FileAppender.scala:38)

===

The command df showed the following information at the worker node:

Filesystem   1K-blocks  Used Available Use% Mounted on

/dev/xvda1 8256920   8256456 0 100% /

tmpfs  7752012 0   7752012   0% /dev/shm

/dev/xvdb 30963708   1729652  27661192   6% /mnt

Does anybody know how to fix this?  Thanks.

Ey-Chih Chow

--

View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/no-space-left-at-worker-node-tp21545.html

Sent from the Apache Spark User List mailing list archive at Nabble.com.

-

To unsubscribe, e-mail: user-unsubscr...@spark.apache.org

For additional commands, e-mail: user-h...@spark.apache.org

RE: no space left at worker node

2015-02-08 Thread ey-chih chow

Thanks Michael.  I didn't edit core-site.xml.  We use the default one.  I only 
saw hdaoop.tmp.dir in core-site.xml, pointing to /mnt/ephemeral-hdfs.  How can 
I edit the config file?
Best regards,
Ey-Chih

Date: Sun, 8 Feb 2015 16:51:32 +
From: m_albert...@yahoo.com
To: gen.tan...@gmail.com; eyc...@hotmail.com
CC: user@spark.apache.org
Subject: Re: no space left at worker node

You might want to take a look in core-site.xml, andsee what is listed as usable 
directories (hadoop.tmp.dir, fs.s3.buffer.dir).
It seems that on S3, the root disk is relatively small (8G), but the config 
files list a mnt directory under it.  Somehow the system doesn't balance 
between the very small space it has under the root disk and the larger disks, 
so the root disk fills up while the others are unused.
At my site, we wrote a boot script to edit these problem out of the config 
before hadoop starts.
-Mike
From: gen tang gen.tan...@gmail.com
 To: ey-chih chow eyc...@hotmail.com 
Cc: user@spark.apache.org user@spark.apache.org 
 Sent: Sunday, February 8, 2015 6:09 AM
 Subject: Re: no space left at worker node
   
Hi,I fact, I met this problem before. it is a bug of AWS. Which type of machine 
do you use?If I guess well, you can check the file /etc/fstab. There would be a 
double mount of /dev/xvdb.If yes, you should1. stop hdfs2. umount /dev/xvdb at 
/ 3. restart hdfsHope this could be helpful.CheersGen

On Sun, Feb 8, 2015 at 8:16 AM, ey-chih chow eyc...@hotmail.com wrote:Hi,

I submitted a spark job to an ec2 cluster, using spark-submit.  At a worker
node, there is an exception of 'no space left on device' as follows.

==
15/02/08 01:53:38 ERROR logging.FileAppender: Error writing stream to file
/root/spark/work/app-20150208014557-0003/0/stdout
java.io.IOException: No space left on device
at java.io.FileOutputStream.writeBytes(Native Method)
at java.io.FileOutputStream.write(FileOutputStream.java:345)
at
org.apache.spark.util.logging.FileAppender.appendToFile(FileAppender.scala:92)
at
org.apache.spark.util.logging.FileAppender.appendStreamToFile(FileAppender.scala:72)
at
org.apache.spark.util.logging.FileAppender$$anon$1$$anonfun$run$1.apply$mcV$sp(FileAppender.scala:39)
at
org.apache.spark.util.logging.FileAppender$$anon$1$$anonfun$run$1.apply(FileAppender.scala:39)
at
org.apache.spark.util.logging.FileAppender$$anon$1$$anonfun$run$1.apply(FileAppender.scala:39)
at
org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1311)
at
org.apache.spark.util.logging.FileAppender$$anon$1.run(FileAppender.scala:38)
===

The command df showed the following information at the worker node:

Filesystem   1K-blocks  Used Available Use% Mounted on
/dev/xvda1 8256920   8256456 0 100% /
tmpfs  7752012 0   7752012   0% /dev/shm
/dev/xvdb 30963708   1729652  27661192   6% /mnt

Does anybody know how to fix this?  Thanks.


Ey-Chih Chow



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/no-space-left-at-worker-node-tp21545.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

RE: no space left at worker node

2015-02-08 Thread ey-chih chow

Hi Gen,
Thanks.  I save my logs in a file under /var/log.  This is the only place to 
save data.  Will the problem go away if I use a better machine?
Best regards,
Ey-Chih Chow

Date: Sun, 8 Feb 2015 23:32:27 +0100
Subject: Re: no space left at worker node
From: gen.tan...@gmail.com
To: eyc...@hotmail.com
CC: user@spark.apache.org

Hi,
I am sorry that I made a mistake. r3.large has only one SSD which has been 
mounted in /mnt. Therefore this is no /dev/sdc.In fact, the problem is that 
there is no space in the under / directory. So you should check whether your 
application write data under this directory(for instance, save file in 
file:///). 
If not, you can use watch du -sh to during the running time to figure out which 
directory is expanding. Normally, only /mnt directory which is supported by SSD 
is expanding significantly, because the data of hdfs is saved here. Then you 
can find the directory which caused no space problem and find out the specific 
reason.
CheersGen

On Sun, Feb 8, 2015 at 10:45 PM, ey-chih chow eyc...@hotmail.com wrote:

Thanks Gen.  How can I check if /dev/sdc is well mounted or not?  In general, 
the problem shows up when I submit the second or third job.  The first job I 
submit most likely will succeed.
Ey-Chih Chow

Date: Sun, 8 Feb 2015 18:18:03 +0100
Subject: Re: no space left at worker node
From: gen.tan...@gmail.com
To: eyc...@hotmail.com
CC: user@spark.apache.org

Hi,
In fact, /dev/sdb is /dev/xvdb. It seems that there is no problem about double 
mount. However, there is no information about /mnt2. You should check whether 
/dev/sdc is well mounted or not.The reply of Micheal is good solution about 
this type of problem. You can check his site.
CheersGen

On Sun, Feb 8, 2015 at 5:53 PM, ey-chih chow eyc...@hotmail.com wrote:

Gen,
Thanks for your information.  The content of /etc/fstab at the worker node 
(r3.large) is:
#LABEL=/ /   ext4defaults,noatime  1   1tmpfs   /dev/shm
tmpfs   defaults0   0devpts  /dev/ptsdevpts  gid=5,mode=620  0  
 0sysfs   /syssysfs   defaults0   0proc/proc   
procdefaults0   0/dev/sdb/mntauto
defaults,noatime,nodiratime,comment=cloudconfig 0   0/dev/sdc/mnt2  
 autodefaults,noatime,nodiratime,comment=cloudconfig 0   0
There is no entry of /dev/xvdb.
 Ey-Chih Chow
Date: Sun, 8 Feb 2015 12:09:37 +0100
Subject: Re: no space left at worker node
From: gen.tan...@gmail.com
To: eyc...@hotmail.com
CC: user@spark.apache.org

Hi,
I fact, I met this problem before. it is a bug of AWS. Which type of machine do 
you use?
If I guess well, you can check the file /etc/fstab. There would be a double 
mount of /dev/xvdb.If yes, you should1. stop hdfs2. umount /dev/xvdb at / 3. 
restart hdfs
Hope this could be helpful.CheersGen

On Sun, Feb 8, 2015 at 8:16 AM, ey-chih chow eyc...@hotmail.com wrote:
Hi,

I submitted a spark job to an ec2 cluster, using spark-submit.  At a worker

node, there is an exception of 'no space left on device' as follows.

==

15/02/08 01:53:38 ERROR logging.FileAppender: Error writing stream to file

/root/spark/work/app-20150208014557-0003/0/stdout

java.io.IOException: No space left on device

at java.io.FileOutputStream.writeBytes(Native Method)

at java.io.FileOutputStream.write(FileOutputStream.java:345)

at

org.apache.spark.util.logging.FileAppender.appendToFile(FileAppender.scala:92)

at

org.apache.spark.util.logging.FileAppender.appendStreamToFile(FileAppender.scala:72)

at

org.apache.spark.util.logging.FileAppender$$anon$1$$anonfun$run$1.apply$mcV$sp(FileAppender.scala:39)

at

org.apache.spark.util.logging.FileAppender$$anon$1$$anonfun$run$1.apply(FileAppender.scala:39)

at

org.apache.spark.util.logging.FileAppender$$anon$1$$anonfun$run$1.apply(FileAppender.scala:39)

at

org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1311)

at

org.apache.spark.util.logging.FileAppender$$anon$1.run(FileAppender.scala:38)

===

The command df showed the following information at the worker node:

Filesystem   1K-blocks  Used Available Use% Mounted on

/dev/xvda1 8256920   8256456 0 100% /

tmpfs  7752012 0   7752012   0% /dev/shm

/dev/xvdb 30963708   1729652  27661192   6% /mnt

Does anybody know how to fix this?  Thanks.

Ey-Chih Chow

--

View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/no-space-left-at-worker-node-tp21545.html

Sent from the Apache Spark User List mailing list archive at Nabble.com.

-

To unsubscribe, e-mail: user-unsubscr...@spark.apache.org

For additional commands, e-mail: user-h...@spark.apache.org

RE: no space left at worker node

2015-02-08 Thread ey-chih chow

Is there any way we can disable Spark copying the jar file to the corresponding 
directory.  I have a fat jar and is already copied to worker nodes using the 
command copydir.  Why Spark needs to save the jar to ./spark/work/appid each 
time a job get started?
Ey-Chih Chow 

Date: Sun, 8 Feb 2015 20:09:32 -0800
Subject: Re: no space left at worker node
From: 2dot7kel...@gmail.com
To: eyc...@hotmail.com
CC: gen.tan...@gmail.com; user@spark.apache.org

I guess you may set the parameters below to clean the directories:
spark.worker.cleanup.enabledspark.worker.cleanup.intervalspark.worker.cleanup.appDataTtl
They are described here: 
http://spark.apache.org/docs/1.2.0/spark-standalone.html
Kelvin
On Sun, Feb 8, 2015 at 5:15 PM, ey-chih chow eyc...@hotmail.com wrote:

I found the problem is, for each application, the Spark worker node saves the 
corresponding std output and std err under ./spark/work/appid, where appid is 
the id of the application.  If I ran several applications in a row, it will out 
of space.  In my case, the disk usage under ./spark/work/ is as follows:
1689784 ./app-20150208203033-0002/01689788  ./app-20150208203033-000240324  
./driver-20150208180505-00011691400 ./app-20150208180509-0001/01691404  
./app-20150208180509-000140316  ./driver-20150208203030-000240320   
./driver-20150208173156-1649876 ./app-20150208173200-/01649880  
./app-20150208173200-5152036.
Any suggestion how to resolve it?  Thanks.
Ey-Chih ChowFrom: eyc...@hotmail.com
To: gen.tan...@gmail.com
CC: user@spark.apache.org
Subject: RE: no space left at worker node
Date: Sun, 8 Feb 2015 15:25:43 -0800

By this way, the input and output paths of the job are all in s3.  I did not 
use paths of hdfs as input or output.
Best regards,
Ey-Chih Chow

From: eyc...@hotmail.com
To: gen.tan...@gmail.com
CC: user@spark.apache.org
Subject: RE: no space left at worker node
Date: Sun, 8 Feb 2015 14:57:15 -0800

Hi Gen,
Thanks.  I save my logs in a file under /var/log.  This is the only place to 
save data.  Will the problem go away if I use a better machine?
Best regards,
Ey-Chih Chow

Date: Sun, 8 Feb 2015 23:32:27 +0100
Subject: Re: no space left at worker node
From: gen.tan...@gmail.com
To: eyc...@hotmail.com
CC: user@spark.apache.org

Hi,
I am sorry that I made a mistake. r3.large has only one SSD which has been 
mounted in /mnt. Therefore this is no /dev/sdc.In fact, the problem is that 
there is no space in the under / directory. So you should check whether your 
application write data under this directory(for instance, save file in 
file:///). 
If not, you can use watch du -sh to during the running time to figure out which 
directory is expanding. Normally, only /mnt directory which is supported by SSD 
is expanding significantly, because the data of hdfs is saved here. Then you 
can find the directory which caused no space problem and find out the specific 
reason.
CheersGen

On Sun, Feb 8, 2015 at 10:45 PM, ey-chih chow eyc...@hotmail.com wrote:

Thanks Gen.  How can I check if /dev/sdc is well mounted or not?  In general, 
the problem shows up when I submit the second or third job.  The first job I 
submit most likely will succeed.
Ey-Chih Chow

Date: Sun, 8 Feb 2015 18:18:03 +0100
Subject: Re: no space left at worker node
From: gen.tan...@gmail.com
To: eyc...@hotmail.com
CC: user@spark.apache.org

Hi,
In fact, /dev/sdb is /dev/xvdb. It seems that there is no problem about double 
mount. However, there is no information about /mnt2. You should check whether 
/dev/sdc is well mounted or not.The reply of Micheal is good solution about 
this type of problem. You can check his site.
CheersGen

On Sun, Feb 8, 2015 at 5:53 PM, ey-chih chow eyc...@hotmail.com wrote:

Gen,
Thanks for your information.  The content of /etc/fstab at the worker node 
(r3.large) is:
#LABEL=/ /   ext4defaults,noatime  1   1tmpfs   /dev/shm
tmpfs   defaults0   0devpts  /dev/ptsdevpts  gid=5,mode=620  0  
 0sysfs   /syssysfs   defaults0   0proc/proc   
procdefaults0   0/dev/sdb/mntauto
defaults,noatime,nodiratime,comment=cloudconfig 0   0/dev/sdc/mnt2  
 autodefaults,noatime,nodiratime,comment=cloudconfig 0   0
There is no entry of /dev/xvdb.
 Ey-Chih Chow
Date: Sun, 8 Feb 2015 12:09:37 +0100
Subject: Re: no space left at worker node
From: gen.tan...@gmail.com
To: eyc...@hotmail.com
CC: user@spark.apache.org

Hi,
I fact, I met this problem before. it is a bug of AWS. Which type of machine do 
you use?
If I guess well, you can check the file /etc/fstab. There would be a double 
mount of /dev/xvdb.If yes, you should1. stop hdfs2. umount /dev/xvdb at / 3. 
restart hdfs
Hope this could be helpful.CheersGen

On Sun, Feb 8, 2015 at 8:16 AM, ey-chih chow eyc...@hotmail.com wrote:
Hi,

I submitted a spark job to an ec2 cluster, using spark-submit.  At a worker

node, there is an exception

RE: no space left at worker node

2015-02-08 Thread ey-chih chow

I found the problem is, for each application, the Spark worker node saves the 
corresponding std output and std err under ./spark/work/appid, where appid is 
the id of the application.  If I ran several applications in a row, it will out 
of space.  In my case, the disk usage under ./spark/work/ is as follows:
1689784 ./app-20150208203033-0002/01689788  ./app-20150208203033-000240324  
./driver-20150208180505-00011691400 ./app-20150208180509-0001/01691404  
./app-20150208180509-000140316  ./driver-20150208203030-000240320   
./driver-20150208173156-1649876 ./app-20150208173200-/01649880  
./app-20150208173200-5152036.
Any suggestion how to resolve it?  Thanks.
Ey-Chih ChowFrom: eyc...@hotmail.com
To: gen.tan...@gmail.com
CC: user@spark.apache.org
Subject: RE: no space left at worker node
Date: Sun, 8 Feb 2015 15:25:43 -0800




By this way, the input and output paths of the job are all in s3.  I did not 
use paths of hdfs as input or output.
Best regards,
Ey-Chih Chow

From: eyc...@hotmail.com
To: gen.tan...@gmail.com
CC: user@spark.apache.org
Subject: RE: no space left at worker node
Date: Sun, 8 Feb 2015 14:57:15 -0800




Hi Gen,
Thanks.  I save my logs in a file under /var/log.  This is the only place to 
save data.  Will the problem go away if I use a better machine?
Best regards,
Ey-Chih Chow

Date: Sun, 8 Feb 2015 23:32:27 +0100
Subject: Re: no space left at worker node
From: gen.tan...@gmail.com
To: eyc...@hotmail.com
CC: user@spark.apache.org

Hi,
I am sorry that I made a mistake. r3.large has only one SSD which has been 
mounted in /mnt. Therefore this is no /dev/sdc.In fact, the problem is that 
there is no space in the under / directory. So you should check whether your 
application write data under this directory(for instance, save file in 
file:///). 
If not, you can use watch du -sh to during the running time to figure out which 
directory is expanding. Normally, only /mnt directory which is supported by SSD 
is expanding significantly, because the data of hdfs is saved here. Then you 
can find the directory which caused no space problem and find out the specific 
reason.
CheersGen


On Sun, Feb 8, 2015 at 10:45 PM, ey-chih chow eyc...@hotmail.com wrote:



Thanks Gen.  How can I check if /dev/sdc is well mounted or not?  In general, 
the problem shows up when I submit the second or third job.  The first job I 
submit most likely will succeed.
Ey-Chih Chow

Date: Sun, 8 Feb 2015 18:18:03 +0100
Subject: Re: no space left at worker node
From: gen.tan...@gmail.com
To: eyc...@hotmail.com
CC: user@spark.apache.org

Hi,
In fact, /dev/sdb is /dev/xvdb. It seems that there is no problem about double 
mount. However, there is no information about /mnt2. You should check whether 
/dev/sdc is well mounted or not.The reply of Micheal is good solution about 
this type of problem. You can check his site.
CheersGen

On Sun, Feb 8, 2015 at 5:53 PM, ey-chih chow eyc...@hotmail.com wrote:



Gen,
Thanks for your information.  The content of /etc/fstab at the worker node 
(r3.large) is:
#LABEL=/ /   ext4defaults,noatime  1   1tmpfs   /dev/shm
tmpfs   defaults0   0devpts  /dev/ptsdevpts  gid=5,mode=620  0  
 0sysfs   /syssysfs   defaults0   0proc/proc   
procdefaults0   0/dev/sdb/mntauto
defaults,noatime,nodiratime,comment=cloudconfig 0   0/dev/sdc/mnt2  
 autodefaults,noatime,nodiratime,comment=cloudconfig 0   0
There is no entry of /dev/xvdb.
 Ey-Chih Chow
Date: Sun, 8 Feb 2015 12:09:37 +0100
Subject: Re: no space left at worker node
From: gen.tan...@gmail.com
To: eyc...@hotmail.com
CC: user@spark.apache.org

Hi,
I fact, I met this problem before. it is a bug of AWS. Which type of machine do 
you use?
If I guess well, you can check the file /etc/fstab. There would be a double 
mount of /dev/xvdb.If yes, you should1. stop hdfs2. umount /dev/xvdb at / 3. 
restart hdfs
Hope this could be helpful.CheersGen


On Sun, Feb 8, 2015 at 8:16 AM, ey-chih chow eyc...@hotmail.com wrote:
Hi,



I submitted a spark job to an ec2 cluster, using spark-submit.  At a worker

node, there is an exception of 'no space left on device' as follows.



==

15/02/08 01:53:38 ERROR logging.FileAppender: Error writing stream to file

/root/spark/work/app-20150208014557-0003/0/stdout

java.io.IOException: No space left on device

at java.io.FileOutputStream.writeBytes(Native Method)

at java.io.FileOutputStream.write(FileOutputStream.java:345)

at

org.apache.spark.util.logging.FileAppender.appendToFile(FileAppender.scala:92)

at

org.apache.spark.util.logging.FileAppender.appendStreamToFile(FileAppender.scala:72)

at

org.apache.spark.util.logging.FileAppender$$anon$1$$anonfun$run$1.apply$mcV$sp(FileAppender.scala:39)

at

org.apache.spark.util.logging.FileAppender$$anon$1$$anonfun$run$1

RE: no space left at worker node

2015-02-08 Thread ey-chih chow

By this way, the input and output paths of the job are all in s3.  I did not 
use paths of hdfs as input or output.
Best regards,
Ey-Chih Chow

From: eyc...@hotmail.com
To: gen.tan...@gmail.com
CC: user@spark.apache.org
Subject: RE: no space left at worker node
Date: Sun, 8 Feb 2015 14:57:15 -0800

Hi Gen,
Thanks.  I save my logs in a file under /var/log.  This is the only place to 
save data.  Will the problem go away if I use a better machine?
Best regards,
Ey-Chih Chow

Date: Sun, 8 Feb 2015 23:32:27 +0100
Subject: Re: no space left at worker node
From: gen.tan...@gmail.com
To: eyc...@hotmail.com
CC: user@spark.apache.org

Hi,
I am sorry that I made a mistake. r3.large has only one SSD which has been 
mounted in /mnt. Therefore this is no /dev/sdc.In fact, the problem is that 
there is no space in the under / directory. So you should check whether your 
application write data under this directory(for instance, save file in 
file:///). 
If not, you can use watch du -sh to during the running time to figure out which 
directory is expanding. Normally, only /mnt directory which is supported by SSD 
is expanding significantly, because the data of hdfs is saved here. Then you 
can find the directory which caused no space problem and find out the specific 
reason.
CheersGen

On Sun, Feb 8, 2015 at 10:45 PM, ey-chih chow eyc...@hotmail.com wrote:

Thanks Gen.  How can I check if /dev/sdc is well mounted or not?  In general, 
the problem shows up when I submit the second or third job.  The first job I 
submit most likely will succeed.
Ey-Chih Chow

Date: Sun, 8 Feb 2015 18:18:03 +0100
Subject: Re: no space left at worker node
From: gen.tan...@gmail.com
To: eyc...@hotmail.com
CC: user@spark.apache.org

Hi,
In fact, /dev/sdb is /dev/xvdb. It seems that there is no problem about double 
mount. However, there is no information about /mnt2. You should check whether 
/dev/sdc is well mounted or not.The reply of Micheal is good solution about 
this type of problem. You can check his site.
CheersGen

On Sun, Feb 8, 2015 at 5:53 PM, ey-chih chow eyc...@hotmail.com wrote:

Gen,
Thanks for your information.  The content of /etc/fstab at the worker node 
(r3.large) is:
#LABEL=/ /   ext4defaults,noatime  1   1tmpfs   /dev/shm
tmpfs   defaults0   0devpts  /dev/ptsdevpts  gid=5,mode=620  0  
 0sysfs   /syssysfs   defaults0   0proc/proc   
procdefaults0   0/dev/sdb/mntauto
defaults,noatime,nodiratime,comment=cloudconfig 0   0/dev/sdc/mnt2  
 autodefaults,noatime,nodiratime,comment=cloudconfig 0   0
There is no entry of /dev/xvdb.
 Ey-Chih Chow
Date: Sun, 8 Feb 2015 12:09:37 +0100
Subject: Re: no space left at worker node
From: gen.tan...@gmail.com
To: eyc...@hotmail.com
CC: user@spark.apache.org

Hi,
I fact, I met this problem before. it is a bug of AWS. Which type of machine do 
you use?
If I guess well, you can check the file /etc/fstab. There would be a double 
mount of /dev/xvdb.If yes, you should1. stop hdfs2. umount /dev/xvdb at / 3. 
restart hdfs
Hope this could be helpful.CheersGen

On Sun, Feb 8, 2015 at 8:16 AM, ey-chih chow eyc...@hotmail.com wrote:
Hi,

I submitted a spark job to an ec2 cluster, using spark-submit.  At a worker

node, there is an exception of 'no space left on device' as follows.

==

15/02/08 01:53:38 ERROR logging.FileAppender: Error writing stream to file

/root/spark/work/app-20150208014557-0003/0/stdout

java.io.IOException: No space left on device

at java.io.FileOutputStream.writeBytes(Native Method)

at java.io.FileOutputStream.write(FileOutputStream.java:345)

at

org.apache.spark.util.logging.FileAppender.appendToFile(FileAppender.scala:92)

at

org.apache.spark.util.logging.FileAppender.appendStreamToFile(FileAppender.scala:72)

at

org.apache.spark.util.logging.FileAppender$$anon$1$$anonfun$run$1.apply$mcV$sp(FileAppender.scala:39)

at

org.apache.spark.util.logging.FileAppender$$anon$1$$anonfun$run$1.apply(FileAppender.scala:39)

at

org.apache.spark.util.logging.FileAppender$$anon$1$$anonfun$run$1.apply(FileAppender.scala:39)

at

org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1311)

at

org.apache.spark.util.logging.FileAppender$$anon$1.run(FileAppender.scala:38)

===

The command df showed the following information at the worker node:

Filesystem   1K-blocks  Used Available Use% Mounted on

/dev/xvda1 8256920   8256456 0 100% /

tmpfs  7752012 0   7752012   0% /dev/shm

/dev/xvdb 30963708   1729652  27661192   6% /mnt

Does anybody know how to fix this?  Thanks.

Ey-Chih Chow

--

View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/no-space-left-at-worker-node-tp21545.html

no space left at worker node

2015-02-07 Thread ey-chih chow

Hi,

I submitted a spark job to an ec2 cluster, using spark-submit.  At a worker
node, there is an exception of 'no space left on device' as follows.  

==
15/02/08 01:53:38 ERROR logging.FileAppender: Error writing stream to file
/root/spark/work/app-20150208014557-0003/0/stdout
java.io.IOException: No space left on device
at java.io.FileOutputStream.writeBytes(Native Method)
at java.io.FileOutputStream.write(FileOutputStream.java:345)
at
org.apache.spark.util.logging.FileAppender.appendToFile(FileAppender.scala:92)
at
org.apache.spark.util.logging.FileAppender.appendStreamToFile(FileAppender.scala:72)
at
org.apache.spark.util.logging.FileAppender$$anon$1$$anonfun$run$1.apply$mcV$sp(FileAppender.scala:39)
at
org.apache.spark.util.logging.FileAppender$$anon$1$$anonfun$run$1.apply(FileAppender.scala:39)
at
org.apache.spark.util.logging.FileAppender$$anon$1$$anonfun$run$1.apply(FileAppender.scala:39)
at
org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1311)
at
org.apache.spark.util.logging.FileAppender$$anon$1.run(FileAppender.scala:38)
===

The command df showed the following information at the worker node:

Filesystem   1K-blocks  Used Available Use% Mounted on
/dev/xvda1 8256920   8256456 0 100% /
tmpfs  7752012 0   7752012   0% /dev/shm
/dev/xvdb 30963708   1729652  27661192   6% /mnt

Does anybody know how to fix this?  Thanks.


Ey-Chih Chow



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/no-space-left-at-worker-node-tp21545.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

synchronously submitting spark jobs

2015-02-04 Thread ey-chih chow

Hi,

I would like to submit spark jobs one by one, in that the next job will not
be submitted until the previous one succeeds.  Spark_submit can only submit
jobs asynchronously.  Is there any way I can submit jobs sequentially? 
Thanks.

Ey-Chih Chow



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/synchronously-submitting-spark-jobs-tp21507.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

unknown issue in submitting a spark job

2015-01-29 Thread ey-chih chow

Hi,

I submitted a job using spark-submit and got the following exception. 
Anybody knows how to fix this?  Thanks.

Ey-Chih Chow



15/01/29 08:53:10 INFO storage.BlockManagerMasterActor: Registering block
manager ip-10-10-8-191.us-west-2.compute.internal:47722 with 6.6 GB RAM
Exception in thread main java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at
org.apache.spark.deploy.worker.DriverWrapper$.main(DriverWrapper.scala:40)
at 
org.apache.spark.deploy.worker.DriverWrapper.main(DriverWrapper.scala)
Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded
at
org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:265)
at 
org.apache.spark.rdd.NewHadoopRDD.getPartitions(NewHadoopRDD.scala:94)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:204)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:202)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:202)
at
org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:32)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:204)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:202)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:202)
at org.apache.spark.rdd.MappedRDD.getPartitions(MappedRDD.scala:28)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:204)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:202)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:202)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1128)
at
org.apache.spark.rdd.PairRDDFunctions.saveAsNewAPIHadoopDataset(PairRDDFunctions.scala:935)
at
org.apache.spark.rdd.PairRDDFunctions.saveAsNewAPIHadoopFile(PairRDDFunctions.scala:832)
at com.crowdstar.etl.ParseAndClean$.main(ParseAndClean.scala:109)
at com.crowdstar.etl.ParseAndClean.main(ParseAndClean.scala)
... 6 more
15/01/29 08:54:33 INFO storage.BlockManager: Removing RDD 1
15/01/29 08:54:33 ERROR actor.ActorSystemImpl: exception on LARS’ timer
thread
java.lang.OutOfMemoryError: GC overhead limit exceeded
at
akka.actor.LightArrayRevolverScheduler$$anon$12.nextTick(Scheduler.scala:397)
at 
akka.actor.LightArrayRevolverScheduler$$anon$12.run(Scheduler.scala:363)
at java.lang.Thread.run(Thread.java:745)
15/01/29 08:54:33 ERROR actor.ActorSystemImpl: Uncaught fatal error from
thread [sparkDriver-scheduler-1] shutting down ActorSystem [sparkDriver]
java.lang.OutOfMemoryError: GC overhead limit exceeded
at
akka.actor.LightArrayRevolverScheduler$$anon$12.nextTick(Scheduler.scala:397)
at 
akka.actor.LightArrayRevolverScheduler$$anon$12.run(Scheduler.scala:363)
at java.lang.Thread.run(Thread.java:745)
15/01/29 08:54:33 ERROR actor.ActorSystemImpl: exception on LARS’ timer
thread
java.lang.OutOfMemoryError: GC overhead limit exceeded
at akka.dispatch.AbstractNodeQueue.init(AbstractNodeQueue.java:19)
at
akka.actor.LightArrayRevolverScheduler$TaskQueue.init(Scheduler.scala:431)
at
akka.actor.LightArrayRevolverScheduler$$anon$12.nextTick(Scheduler.scala:397)
at 
akka.actor.LightArrayRevolverScheduler$$anon$12.run(Scheduler.scala:363)
at java.lang.Thread.run(Thread.java:745)
15/01/29 08:54:33 ERROR actor.ActorSystemImpl: Uncaught fatal error from
thread [Driver-scheduler-1] shutting down ActorSystem [Driver]
java.lang.OutOfMemoryError: GC overhead limit exceeded
at akka.dispatch.AbstractNodeQueue.init(AbstractNodeQueue.java:19)
at
akka.actor.LightArrayRevolverScheduler$TaskQueue.init(Scheduler.scala:431)
at
akka.actor.LightArrayRevolverScheduler$$anon$12.nextTick(Scheduler.scala:397)
at 
akka.actor.LightArrayRevolverScheduler$$anon$12.run(Scheduler.scala:363)
at java.lang.Thread.run(Thread.java:745)
15/01/29 08:54:33 WARN storage.BlockManagerMasterActor: Removing
BlockManager BlockManagerId(0, ip-10-10-8-191.us-west-2.compute.internal,
47722, 0) with no recent heart beats: 82575ms exceeds 45000ms
15/01/29 08:54:33 INFO spark.ContextCleaner: Cleaned RDD 1
15/01/29 08:54:33 WARN util.AkkaUtils: Error sending message in 1 attempts
akka.pattern.AskTimeoutException:
Recipient[Actor[akka://sparkDriver/user/BlockManagerMaster#-538003375]] had
already been terminated.
at akka.pattern.AskableActorRef$.ask

RE: unknown issue in submitting a spark job

2015-01-29 Thread ey-chih chow

The worker node has 15G memory, 1x32 GB SSD, and 2 core.  The data file is from 
S3. If I don't set mapred.max.split.size, it is fine with only one partition.  
Otherwise, it will generate OOME.
Ey-Chih Chow

 From: moham...@glassbeam.com To: eyc...@hotmail.com; user@spark.apache.org
 Subject: RE: unknown issue in submitting a spark job
 Date: Thu, 29 Jan 2015 21:16:13 +

 Looks like the application is using a lot more memory than available. Could 
 be a bug somewhere in the code or just underpowered machine. Hard to say 
 without looking at the code.

 Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded

 Mohammed

 -Original Message-
 From: ey-chih chow [mailto:eyc...@hotmail.com] 
 Sent: Thursday, January 29, 2015 1:06 AM
 To: user@spark.apache.org
 Subject: unknown issue in submitting a spark job

 Hi,

 I submitted a job using spark-submit and got the following exception. 
 Anybody knows how to fix this?  Thanks.

 Ey-Chih Chow

 15/01/29 08:53:10 INFO storage.BlockManagerMasterActor: Registering block 
 manager ip-10-10-8-191.us-west-2.compute.internal:47722 with 6.6 GB RAM 
 Exception in thread main java.lang.reflect.InvocationTargetException
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
   at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:606)
   at
 org.apache.spark.deploy.worker.DriverWrapper$.main(DriverWrapper.scala:40)
   at 
 org.apache.spark.deploy.worker.DriverWrapper.main(DriverWrapper.scala)
 Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded
   at
 org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:265)
   at 
 org.apache.spark.rdd.NewHadoopRDD.getPartitions(NewHadoopRDD.scala:94)
   at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:204)
   at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:202)
   at scala.Option.getOrElse(Option.scala:120)
   at org.apache.spark.rdd.RDD.partitions(RDD.scala:202)
   at
 org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:32)
   at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:204)
   at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:202)
   at scala.Option.getOrElse(Option.scala:120)
   at org.apache.spark.rdd.RDD.partitions(RDD.scala:202)
   at org.apache.spark.rdd.MappedRDD.getPartitions(MappedRDD.scala:28)
   at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:204)
   at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:202)
   at scala.Option.getOrElse(Option.scala:120)
   at org.apache.spark.rdd.RDD.partitions(RDD.scala:202)
   at org.apache.spark.SparkContext.runJob(SparkContext.scala:1128)
   at
 org.apache.spark.rdd.PairRDDFunctions.saveAsNewAPIHadoopDataset(PairRDDFunctions.scala:935)
   at
 org.apache.spark.rdd.PairRDDFunctions.saveAsNewAPIHadoopFile(PairRDDFunctions.scala:832)
   at com.crowdstar.etl.ParseAndClean$.main(ParseAndClean.scala:109)
   at com.crowdstar.etl.ParseAndClean.main(ParseAndClean.scala)
   ... 6 more
 15/01/29 08:54:33 INFO storage.BlockManager: Removing RDD 1
 15/01/29 08:54:33 ERROR actor.ActorSystemImpl: exception on LARS’ timer thread
 java.lang.OutOfMemoryError: GC overhead limit exceeded
   at
 akka.actor.LightArrayRevolverScheduler$$anon$12.nextTick(Scheduler.scala:397)
   at 
 akka.actor.LightArrayRevolverScheduler$$anon$12.run(Scheduler.scala:363)
   at java.lang.Thread.run(Thread.java:745)
 15/01/29 08:54:33 ERROR actor.ActorSystemImpl: Uncaught fatal error from 
 thread [sparkDriver-scheduler-1] shutting down ActorSystem [sparkDriver]
 java.lang.OutOfMemoryError: GC overhead limit exceeded
   at
 akka.actor.LightArrayRevolverScheduler$$anon$12.nextTick(Scheduler.scala:397)
   at 
 akka.actor.LightArrayRevolverScheduler$$anon$12.run(Scheduler.scala:363)
   at java.lang.Thread.run(Thread.java:745)
 15/01/29 08:54:33 ERROR actor.ActorSystemImpl: exception on LARS’ timer thread
 java.lang.OutOfMemoryError: GC overhead limit exceeded
   at akka.dispatch.AbstractNodeQueue.init(AbstractNodeQueue.java:19)
   at
 akka.actor.LightArrayRevolverScheduler$TaskQueue.init(Scheduler.scala:431)
   at
 akka.actor.LightArrayRevolverScheduler$$anon$12.nextTick(Scheduler.scala:397)
   at 
 akka.actor.LightArrayRevolverScheduler$$anon$12.run(Scheduler.scala:363)
   at java.lang.Thread.run(Thread.java:745)
 15/01/29 08:54:33 ERROR actor.ActorSystemImpl: Uncaught fatal error from 
 thread [Driver-scheduler-1] shutting down ActorSystem [Driver]
 java.lang.OutOfMemoryError: GC overhead limit exceeded
   at akka.dispatch.AbstractNodeQueue.init

RE: unknown issue in submitting a spark job

2015-01-29 Thread ey-chih chow

I use the default value, which I think is 512MB.  If I change to 1024MB, Spark 
submit will fail due to not enough memory for rdd.
Ey-Chih Chow

From: moham...@glassbeam.com
To: eyc...@hotmail.com; user@spark.apache.org
Subject: RE: unknown issue in submitting a spark job
Date: Fri, 30 Jan 2015 00:32:57 +









How much memory are you assigning to the Spark executor on the worker node?
 

Mohammed

 


From: ey-chih chow [mailto:eyc...@hotmail.com]


Sent: Thursday, January 29, 2015 3:35 PM

To: Mohammed Guller; user@spark.apache.org

Subject: RE: unknown issue in submitting a spark job


 

The worker node has 15G memory, 1x32 GB SSD, and 2 core.  The data file is from 
S3. If I don't set mapred.max.split.size, it
 is fine with only one partition.  Otherwise, it will generate OOME.

 


Ey-Chih Chow

 


 From: 
moham...@glassbeam.com

 To: 
eyc...@hotmail.com; user@spark.apache.org

 Subject: RE: unknown issue in submitting a spark job

 Date: Thu, 29 Jan 2015 21:16:13 +

 

 Looks like the application is using a lot more memory than available. Could 
 be a bug somewhere in the code or just underpowered machine. Hard to say 
 without looking at the code.

 

 Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded

 

 Mohammed

 

 

 -Original Message-

 From: ey-chih chow [mailto:eyc...@hotmail.com]


 Sent: Thursday, January 29, 2015 1:06 AM

 To: user@spark.apache.org

 Subject: unknown issue in submitting a spark job

 

 Hi,

 

 I submitted a job using spark-submit and got the following exception. 

 Anybody knows how to fix this? Thanks.

 

 Ey-Chih Chow

 

 

 

 15/01/29 08:53:10 INFO storage.BlockManagerMasterActor: Registering block 
 manager ip-10-10-8-191.us-west-2.compute.internal:47722 with 6.6 GB RAM 
 Exception in thread main java.lang.reflect.InvocationTargetException

 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

 at

 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)

 at

 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

 at java.lang.reflect.Method.invoke(Method.java:606)

 at

 org.apache.spark.deploy.worker.DriverWrapper$.main(DriverWrapper.scala:40)

 at org.apache.spark.deploy.worker.DriverWrapper.main(DriverWrapper.scala)

 Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded

 at

 org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:265)

 at org.apache.spark.rdd.NewHadoopRDD.getPartitions(NewHadoopRDD.scala:94)

 at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:204)

 at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:202)

 at scala.Option.getOrElse(Option.scala:120)

 at org.apache.spark.rdd.RDD.partitions(RDD.scala:202)

 at

 org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:32)

 at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:204)

 at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:202)

 at scala.Option.getOrElse(Option.scala:120)

 at org.apache.spark.rdd.RDD.partitions(RDD.scala:202)

 at org.apache.spark.rdd.MappedRDD.getPartitions(MappedRDD.scala:28)

 at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:204)

 at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:202)

 at scala.Option.getOrElse(Option.scala:120)

 at org.apache.spark.rdd.RDD.partitions(RDD.scala:202)

 at org.apache.spark.SparkContext.runJob(SparkContext.scala:1128)

 at

 org.apache.spark.rdd.PairRDDFunctions.saveAsNewAPIHadoopDataset(PairRDDFunctions.scala:935)

 at

 org.apache.spark.rdd.PairRDDFunctions.saveAsNewAPIHadoopFile(PairRDDFunctions.scala:832)

 at com.crowdstar.etl.ParseAndClean$.main(ParseAndClean.scala:109)

 at com.crowdstar.etl.ParseAndClean.main(ParseAndClean.scala)

 ... 6 more

 15/01/29 08:54:33 INFO storage.BlockManager: Removing RDD 1

 15/01/29 08:54:33 ERROR actor.ActorSystemImpl: exception on LARS’ timer thread

 java.lang.OutOfMemoryError: GC overhead limit exceeded

 at

 akka.actor.LightArrayRevolverScheduler$$anon$12.nextTick(Scheduler.scala:397)

 at akka.actor.LightArrayRevolverScheduler$$anon$12.run(Scheduler.scala:363)

 at java.lang.Thread.run(Thread.java:745)

 15/01/29 08:54:33 ERROR actor.ActorSystemImpl: Uncaught fatal error from 
 thread [sparkDriver-scheduler-1] shutting down ActorSystem [sparkDriver]

 java.lang.OutOfMemoryError: GC overhead limit exceeded

 at

 akka.actor.LightArrayRevolverScheduler$$anon$12.nextTick(Scheduler.scala:397)

 at akka.actor.LightArrayRevolverScheduler$$anon$12.run(Scheduler.scala:363)

 at java.lang.Thread.run(Thread.java:745)

 15/01/29 08:54:33 ERROR actor.ActorSystemImpl: exception on LARS’ timer thread

 java.lang.OutOfMemoryError: GC overhead limit exceeded

 at akka.dispatch.AbstractNodeQueue.init(AbstractNodeQueue.java:19)

 at

 akka.actor.LightArrayRevolverScheduler$TaskQueue.init

RE: spark 1.2 ec2 launch script hang

2015-01-28 Thread ey-chih chow

We found the problem and already fixed it.  Basically, spark-ec2 requires ec2 
instances to have external ip addresses. You need to specify this in the ASW 
console.  

From: nicholas.cham...@gmail.com
Date: Tue, 27 Jan 2015 17:19:21 +
Subject: Re: spark 1.2 ec2 launch script hang
To: charles.fed...@gmail.com; pzybr...@gmail.com; eyc...@hotmail.com
CC: user@spark.apache.org

For those who found that absolute vs. relative path for the pem file mattered, 
what OS and shell are you using? What version of Spark are you using?
~/ vs. absolute path shouldn’t matter. Your shell will expand the ~/ to the 
absolute path before sending it to spark-ec2. (i.e. tilde expansion.)
Absolute vs. relative path (e.g. ../../path/to/pem) also shouldn’t matter, 
since we fixed that for Spark 1.2.0. Maybe there’s some case that we missed?
Nick
On Tue Jan 27 2015 at 10:10:29 AM Charles Feduke charles.fed...@gmail.com 
wrote:

Absolute path means no ~ and also verify that you have the path to the file 
correct. For some reason the Python code does not validate that the file exists 
and will hang (this is the same reason why ~ hangs).
On Mon, Jan 26, 2015 at 10:08 PM Pete Zybrick pzybr...@gmail.com wrote:
Try using an absolute path to the pem file

 On Jan 26, 2015, at 8:57 PM, ey-chih chow eyc...@hotmail.com wrote:

 Hi,

 I used the spark-ec2 script of spark 1.2 to launch a cluster.  I have

 modified the script according to

 https://github.com/grzegorz-dubicki/spark/commit/5dd8458d2ab9753aae939b3bb33be953e2c13a70

 But the script was still hung at the following message:

 Waiting for cluster to enter 'ssh-ready'

 state.

 Any additional thing I should do to make it succeed?  Thanks.

 Ey-Chih Chow

 --

 View this message in context: 
 http://apache-spark-user-list.1001560.n3.nabble.com/spark-1-2-ec2-launch-script-hang-tp21381.html

 Sent from the Apache Spark User List mailing list archive at Nabble.com.

 -

 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org

 For additional commands, e-mail: user-h...@spark.apache.org

-

To unsubscribe, e-mail: user-unsubscr...@spark.apache.org

For additional commands, e-mail: user-h...@spark.apache.org

spark 1.2 ec2 launch script hang

2015-01-26 Thread ey-chih chow

Hi,

I used the spark-ec2 script of spark 1.2 to launch a cluster.  I have
modified the script according to 

https://github.com/grzegorz-dubicki/spark/commit/5dd8458d2ab9753aae939b3bb33be953e2c13a70

But the script was still hung at the following message:

Waiting for cluster to enter 'ssh-ready'
state.

Any additional thing I should do to make it succeed?  Thanks.


Ey-Chih Chow



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/spark-1-2-ec2-launch-script-hang-tp21381.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

RE: spark 1.1.0 save data to hdfs failed

2015-01-24 Thread ey-chih chow

 there.
 
 Your first error definitely indicates you have the wrong version of
 Hadoop on the client side. It's not matching your HDFS version. And
 the second suggests you are mixing code compiled for different
 versions of Hadoop. I think you need to check what version of Hadoop
 your Spark is compiled for. For example I saw a reference to CDH 5.2
 which is Hadoop 2.5, but then you're showing that you are running an
 old Hadoop 1.x HDFS? there seem to be a number of possible
 incompatibilities here.
 
 On Fri, Jan 23, 2015 at 11:38 PM, ey-chih chow eyc...@hotmail.com wrote:
  Sorry I still did not quiet get your resolution.  In my jar, there are
  following three related classes:
 
  org/apache/hadoop/mapreduce/task/TaskAttemptContextImpl.class
  org/apache/hadoop/mapreduce/task/TaskAttemptContextImpl$DummyReporter.class
  org/apache/hadoop/mapreduce/TaskAttemptContext.class
 
  I think the first two come from hadoop2 and the third from hadoop1.  I would
  like to get rid of the first two.  I checked my source code.  It does have a
  place using the class (or interface in hadoop2) TaskAttemptContext.
  Do you mean I make a separate jar for this portion of code and built with
  hadoop1 to get rid of dependency?  An alternative way is to  modify the code
  in SparkHadoopMapReduceUtil.scala and put it into my own source code to
  bypass the problem.  Any comment on this?  Thanks.
 
  
  From: eyc...@hotmail.com
  To: so...@cloudera.com
  CC: user@spark.apache.org
  Subject: RE: spark 1.1.0 save data to hdfs failed
  Date: Fri, 23 Jan 2015 11:17:36 -0800
 
 
  Thanks.  I looked at the dependency tree.  I did not see any dependent jar
  of hadoop-core from hadoop2.  However the jar built from maven has the
  class:
 
   org/apache/hadoop/mapreduce/task/TaskAttemptContextImpl.class
 
  Do you know why?
 
 
 
 
  
  Date: Fri, 23 Jan 2015 17:01:48 +
  Subject: RE: spark 1.1.0 save data to hdfs failed
  From: so...@cloudera.com
  To: eyc...@hotmail.com
 
  Are you receiving my replies? I have suggested a resolution. Look at the
  dependency tree next.
 
  On Jan 23, 2015 2:43 PM, ey-chih chow eyc...@hotmail.com wrote:
 
  I looked into the source code of SparkHadoopMapReduceUtil.scala. I think it
  is broken in the following code:
 
def newTaskAttemptContext(conf: Configuration, attemptId: TaskAttemptID):
  TaskAttemptContext = {
  val klass = firstAvailableClass(
  org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl,  //
  hadoop2, hadoop2-yarn
  org.apache.hadoop.mapreduce.TaskAttemptContext)   //
  hadoop1
  val ctor = klass.getDeclaredConstructor(classOf[Configuration],
  classOf[TaskAttemptID])
  ctor.newInstance(conf, attemptId).asInstanceOf[TaskAttemptContext]
}
 
  In other words, it is related to hadoop2, hadoop2-yarn, and hadoop1.  Any
  suggestion how to resolve it?
 
  Thanks.
 
 
 
  From: so...@cloudera.com
  Date: Fri, 23 Jan 2015 14:01:45 +
  Subject: Re: spark 1.1.0 save data to hdfs failed
  To: eyc...@hotmail.com
  CC: user@spark.apache.org
 
  These are all definitely symptoms of mixing incompatible versions of
  libraries.
 
  I'm not suggesting you haven't excluded Spark / Hadoop, but, this is
  not the only way Hadoop deps get into your app. See my suggestion
  about investigating the dependency tree.
 
  On Fri, Jan 23, 2015 at 1:53 PM, ey-chih chow eyc...@hotmail.com wrote:
   Thanks. But I think I already mark all the Spark and Hadoop reps as
   provided. Why the cluster's version is not used?
  
   Any way, as I mentioned in the previous message, after changing the
   hadoop-client to version 1.2.1 in my maven deps, I already pass the
   exception and go to another one as indicated below. Any suggestion on
   this?
  
   =
  
   Exception in thread main java.lang.reflect.InvocationTargetException
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at
  
   sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
   at
  
   sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:606)
   at
  
   org.apache.spark.deploy.worker.DriverWrapper$.main(DriverWrapper.scala:40)
   at
   org.apache.spark.deploy.worker.DriverWrapper.main(DriverWrapper.scala)
   Caused by: java.lang.IncompatibleClassChangeError: Implementing class
   at java.lang.ClassLoader.defineClass1(Native Method)
   at java.lang.ClassLoader.defineClass(ClassLoader.java:800)
   at
   java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
   at java.net.URLClassLoader.defineClass(URLClassLoader.java:449)
   at java.net.URLClassLoader.access$100(URLClassLoader.java:71)
   at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
   at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
   at java.security.AccessController.doPrivileged(Native Method

RE: spark 1.1.0 save data to hdfs failed

2015-01-24 Thread ey-chih chow

 for. For example I saw a reference to CDH 5.2
 which is Hadoop 2.5, but then you're showing that you are running an
 old Hadoop 1.x HDFS? there seem to be a number of possible
 incompatibilities here.
 
 On Fri, Jan 23, 2015 at 11:38 PM, ey-chih chow eyc...@hotmail.com wrote:
  Sorry I still did not quiet get your resolution.  In my jar, there are
  following three related classes:
 
  org/apache/hadoop/mapreduce/task/TaskAttemptContextImpl.class
  org/apache/hadoop/mapreduce/task/TaskAttemptContextImpl$DummyReporter.class
  org/apache/hadoop/mapreduce/TaskAttemptContext.class
 
  I think the first two come from hadoop2 and the third from hadoop1.  I would
  like to get rid of the first two.  I checked my source code.  It does have a
  place using the class (or interface in hadoop2) TaskAttemptContext.
  Do you mean I make a separate jar for this portion of code and built with
  hadoop1 to get rid of dependency?  An alternative way is to  modify the code
  in SparkHadoopMapReduceUtil.scala and put it into my own source code to
  bypass the problem.  Any comment on this?  Thanks.
 
  
  From: eyc...@hotmail.com
  To: so...@cloudera.com
  CC: user@spark.apache.org
  Subject: RE: spark 1.1.0 save data to hdfs failed
  Date: Fri, 23 Jan 2015 11:17:36 -0800
 
 
  Thanks.  I looked at the dependency tree.  I did not see any dependent jar
  of hadoop-core from hadoop2.  However the jar built from maven has the
  class:
 
   org/apache/hadoop/mapreduce/task/TaskAttemptContextImpl.class
 
  Do you know why?
 
 
 
 
  
  Date: Fri, 23 Jan 2015 17:01:48 +
  Subject: RE: spark 1.1.0 save data to hdfs failed
  From: so...@cloudera.com
  To: eyc...@hotmail.com
 
  Are you receiving my replies? I have suggested a resolution. Look at the
  dependency tree next.
 
  On Jan 23, 2015 2:43 PM, ey-chih chow eyc...@hotmail.com wrote:
 
  I looked into the source code of SparkHadoopMapReduceUtil.scala. I think it
  is broken in the following code:
 
def newTaskAttemptContext(conf: Configuration, attemptId: TaskAttemptID):
  TaskAttemptContext = {
  val klass = firstAvailableClass(
  org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl,  //
  hadoop2, hadoop2-yarn
  org.apache.hadoop.mapreduce.TaskAttemptContext)   //
  hadoop1
  val ctor = klass.getDeclaredConstructor(classOf[Configuration],
  classOf[TaskAttemptID])
  ctor.newInstance(conf, attemptId).asInstanceOf[TaskAttemptContext]
}
 
  In other words, it is related to hadoop2, hadoop2-yarn, and hadoop1.  Any
  suggestion how to resolve it?
 
  Thanks.
 
 
 
  From: so...@cloudera.com
  Date: Fri, 23 Jan 2015 14:01:45 +
  Subject: Re: spark 1.1.0 save data to hdfs failed
  To: eyc...@hotmail.com
  CC: user@spark.apache.org
 
  These are all definitely symptoms of mixing incompatible versions of
  libraries.
 
  I'm not suggesting you haven't excluded Spark / Hadoop, but, this is
  not the only way Hadoop deps get into your app. See my suggestion
  about investigating the dependency tree.
 
  On Fri, Jan 23, 2015 at 1:53 PM, ey-chih chow eyc...@hotmail.com wrote:
   Thanks. But I think I already mark all the Spark and Hadoop reps as
   provided. Why the cluster's version is not used?
  
   Any way, as I mentioned in the previous message, after changing the
   hadoop-client to version 1.2.1 in my maven deps, I already pass the
   exception and go to another one as indicated below. Any suggestion on
   this?
  
   =
  
   Exception in thread main java.lang.reflect.InvocationTargetException
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at
  
   sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
   at
  
   sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:606)
   at
  
   org.apache.spark.deploy.worker.DriverWrapper$.main(DriverWrapper.scala:40)
   at
   org.apache.spark.deploy.worker.DriverWrapper.main(DriverWrapper.scala)
   Caused by: java.lang.IncompatibleClassChangeError: Implementing class
   at java.lang.ClassLoader.defineClass1(Native Method)
   at java.lang.ClassLoader.defineClass(ClassLoader.java:800)
   at
   java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
   at java.net.URLClassLoader.defineClass(URLClassLoader.java:449)
   at java.net.URLClassLoader.access$100(URLClassLoader.java:71)
   at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
   at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
   at java.security.AccessController.doPrivileged(Native Method)
   at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
   at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
   at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
   at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
   at java.lang.Class.forName0(Native Method

RE: spark 1.1.0 save data to hdfs failed

2015-01-23 Thread ey-chih chow

Thanks.  I looked at the dependency tree.  I did not see any dependent jar of 
hadoop-core from hadoop2.  However the jar built from maven has the class:
 org/apache/hadoop/mapreduce/task/TaskAttemptContextImpl.class
Do you know why?

Date: Fri, 23 Jan 2015 17:01:48 +
Subject: RE: spark 1.1.0 save data to hdfs failed
From: so...@cloudera.com
To: eyc...@hotmail.com

Are you receiving my replies? I have suggested a resolution. Look at the 
dependency tree next. 
On Jan 23, 2015 2:43 PM, ey-chih chow eyc...@hotmail.com wrote:

I looked into the source code of SparkHadoopMapReduceUtil.scala. I think it is 
broken in the following code:
  def newTaskAttemptContext(conf: Configuration, attemptId: TaskAttemptID): 
TaskAttemptContext = {val klass = firstAvailableClass(
org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl,  // hadoop2, 
hadoop2-yarnorg.apache.hadoop.mapreduce.TaskAttemptContext)   
// hadoop1val ctor = klass.getDeclaredConstructor(classOf[Configuration], 
classOf[TaskAttemptID])ctor.newInstance(conf, 
attemptId).asInstanceOf[TaskAttemptContext]  }
In other words, it is related to hadoop2, hadoop2-yarn, and hadoop1.  Any 
suggestion how to resolve it?
Thanks.

 From: so...@cloudera.com
 Date: Fri, 23 Jan 2015 14:01:45 +
 Subject: Re: spark 1.1.0 save data to hdfs failed
 To: eyc...@hotmail.com
 CC: user@spark.apache.org

 These are all definitely symptoms of mixing incompatible versions of 
 libraries.

 I'm not suggesting you haven't excluded Spark / Hadoop, but, this is
 not the only way Hadoop deps get into your app. See my suggestion
 about investigating the dependency tree.

 On Fri, Jan 23, 2015 at 1:53 PM, ey-chih chow eyc...@hotmail.com wrote:
  Thanks.  But I think I already mark all the Spark and Hadoop reps as
  provided.  Why the cluster's version is not used?

  Any way, as I mentioned in the previous message, after changing the
  hadoop-client to version 1.2.1 in my maven deps, I already pass the
  exception and go to another one as indicated below.  Any suggestion on this?

  =

  Exception in thread main java.lang.reflect.InvocationTargetException
  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
  at
  sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
  at
  sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  at java.lang.reflect.Method.invoke(Method.java:606)
  at
  org.apache.spark.deploy.worker.DriverWrapper$.main(DriverWrapper.scala:40)
  at org.apache.spark.deploy.worker.DriverWrapper.main(DriverWrapper.scala)
  Caused by: java.lang.IncompatibleClassChangeError: Implementing class
  at java.lang.ClassLoader.defineClass1(Native Method)
  at java.lang.ClassLoader.defineClass(ClassLoader.java:800)
  at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
  at java.net.URLClassLoader.defineClass(URLClassLoader.java:449)
  at java.net.URLClassLoader.access$100(URLClassLoader.java:71)
  at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
  at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
  at java.security.AccessController.doPrivileged(Native Method)
  at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
  at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
  at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
  at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
  at java.lang.Class.forName0(Native Method)
  at java.lang.Class.forName(Class.java:191)
  at
  org.apache.hadoop.mapreduce.SparkHadoopMapReduceUtil$class.firstAvailableClass(SparkHadoopMapReduceUtil.scala:73)
  at
  org.apache.hadoop.mapreduce.SparkHadoopMapReduceUtil$class.newTaskAttemptContext(SparkHadoopMapReduceUtil.scala:35)
  at
  org.apache.spark.rdd.PairRDDFunctions.newTaskAttemptContext(PairRDDFunctions.scala:53)
  at
  org.apache.spark.rdd.PairRDDFunctions.saveAsNewAPIHadoopDataset(PairRDDFunctions.scala:932)
  at
  org.apache.spark.rdd.PairRDDFunctions.saveAsNewAPIHadoopFile(PairRDDFunctions.scala:832)
  at com.crowdstar.etl.ParseAndClean$.main(ParseAndClean.scala:103)
  at com.crowdstar.etl.ParseAndClean.main(ParseAndClean.scala)

  ... 6 more

RE: spark 1.1.0 save data to hdfs failed

2015-01-23 Thread ey-chih chow

Thanks.  But I think I already mark all the Spark and Hadoop reps as provided.  
Why the cluster's version is not used?
Any way, as I mentioned in the previous message, after changing the 
hadoop-client to version 1.2.1 in my maven deps, I already pass the exception 
and go to another one as indicated below.  Any suggestion on this?
=Exception in thread main 
java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
org.apache.spark.deploy.worker.DriverWrapper$.main(DriverWrapper.scala:40)
at 
org.apache.spark.deploy.worker.DriverWrapper.main(DriverWrapper.scala)
Caused by: java.lang.IncompatibleClassChangeError: Implementing class
at java.lang.ClassLoader.defineClass1(Native Method)
at java.lang.ClassLoader.defineClass(ClassLoader.java:800)
at 
java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
at java.net.URLClassLoader.defineClass(URLClassLoader.java:449)
at java.net.URLClassLoader.access$100(URLClassLoader.java:71)
at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:191)
at 
org.apache.hadoop.mapreduce.SparkHadoopMapReduceUtil$class.firstAvailableClass(SparkHadoopMapReduceUtil.scala:73)
at 
org.apache.hadoop.mapreduce.SparkHadoopMapReduceUtil$class.newTaskAttemptContext(SparkHadoopMapReduceUtil.scala:35)
at 
org.apache.spark.rdd.PairRDDFunctions.newTaskAttemptContext(PairRDDFunctions.scala:53)
at 
org.apache.spark.rdd.PairRDDFunctions.saveAsNewAPIHadoopDataset(PairRDDFunctions.scala:932)
at 
org.apache.spark.rdd.PairRDDFunctions.saveAsNewAPIHadoopFile(PairRDDFunctions.scala:832)
at com.crowdstar.etl.ParseAndClean$.main(ParseAndClean.scala:103)
at com.crowdstar.etl.ParseAndClean.main(ParseAndClean.scala)... 6 
more  
 From: so...@cloudera.com
 Date: Fri, 23 Jan 2015 10:41:12 +
 Subject: Re: spark 1.1.0 save data to hdfs failed
 To: eyc...@hotmail.com
 CC: user@spark.apache.org
 
 So, you should not depend on Hadoop artifacts unless you use them
 directly. You should mark Hadoop and Spark deps as provided. Then the
 cluster's version is used at runtime with spark-submit. That's the
 usual way to do it, which works.
 
 If you need to embed Spark in your app and are running it outside the
 cluster for some reason, and you have to embed Hadoop and Spark code
 in your app, the version has to match. You should also use mvn
 dependency:tree to see all the dependencies coming in. There may be
 many sources of a Hadoop dep.
 
 On Fri, Jan 23, 2015 at 1:05 AM, ey-chih chow eyc...@hotmail.com wrote:
  Thanks.  But after I replace the maven dependence from
 
  dependency
   groupIdorg.apache.hadoop/groupId
   artifactIdhadoop-client/artifactId
   version2.5.0-cdh5.2.0/version
   scopeprovided/scope
   exclusions
 exclusion
   groupIdorg.mortbay.jetty/groupId
   artifactIdservlet-api/artifactId
 /exclusion
 exclusion
   groupIdjavax.servlet/groupId
   artifactIdservlet-api/artifactId
 /exclusion
 exclusion
   groupIdio.netty/groupId
   artifactIdnetty/artifactId
 /exclusion
   /exclusions
  /dependency
 
  to
 
  dependency
 
   groupIdorg.apache.hadoop/groupId
 
   artifactIdhadoop-client/artifactId
 
   version1.0.4/version
 
   scopeprovided/scope
 
   exclusions
 
 exclusion
 
   groupIdorg.mortbay.jetty/groupId
 
   artifactIdservlet-api/artifactId
 
 /exclusion

RE: spark 1.1.0 save data to hdfs failed

2015-01-23 Thread ey-chih chow

Sorry I still did not quiet get your resolution.  In my jar, there are 
following three related classes:
org/apache/hadoop/mapreduce/task/TaskAttemptContextImpl.classorg/apache/hadoop/mapreduce/task/TaskAttemptContextImpl$DummyReporter.classorg/apache/hadoop/mapreduce/TaskAttemptContext.class
I think the first two come from hadoop2 and the third from hadoop1.  I would 
like to get rid of the first two.  I checked my source code.  It does have a 
place using the class (or interface in hadoop2) TaskAttemptContext.Do you mean 
I make a separate jar for this portion of code and built with hadoop1 to get 
rid of dependency?  An alternative way is to  modify the code in 
SparkHadoopMapReduceUtil.scala and put it into my own source code to bypass the 
problem.  Any comment on this?  Thanks.
From: eyc...@hotmail.com
To: so...@cloudera.com
CC: user@spark.apache.org
Subject: RE: spark 1.1.0 save data to hdfs failed
Date: Fri, 23 Jan 2015 11:17:36 -0800




Thanks.  I looked at the dependency tree.  I did not see any dependent jar of 
hadoop-core from hadoop2.  However the jar built from maven has the class:
 org/apache/hadoop/mapreduce/task/TaskAttemptContextImpl.class
Do you know why?



Date: Fri, 23 Jan 2015 17:01:48 +
Subject: RE: spark 1.1.0 save data to hdfs failed
From: so...@cloudera.com
To: eyc...@hotmail.com

Are you receiving my replies? I have suggested a resolution. Look at the 
dependency tree next. 
On Jan 23, 2015 2:43 PM, ey-chih chow eyc...@hotmail.com wrote:



I looked into the source code of SparkHadoopMapReduceUtil.scala. I think it is 
broken in the following code:
  def newTaskAttemptContext(conf: Configuration, attemptId: TaskAttemptID): 
TaskAttemptContext = {val klass = firstAvailableClass(
org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl,  // hadoop2, 
hadoop2-yarnorg.apache.hadoop.mapreduce.TaskAttemptContext)   
// hadoop1val ctor = klass.getDeclaredConstructor(classOf[Configuration], 
classOf[TaskAttemptID])ctor.newInstance(conf, 
attemptId).asInstanceOf[TaskAttemptContext]  }
In other words, it is related to hadoop2, hadoop2-yarn, and hadoop1.  Any 
suggestion how to resolve it?
Thanks.


 From: so...@cloudera.com
 Date: Fri, 23 Jan 2015 14:01:45 +
 Subject: Re: spark 1.1.0 save data to hdfs failed
 To: eyc...@hotmail.com
 CC: user@spark.apache.org
 
 These are all definitely symptoms of mixing incompatible versions of 
 libraries.
 
 I'm not suggesting you haven't excluded Spark / Hadoop, but, this is
 not the only way Hadoop deps get into your app. See my suggestion
 about investigating the dependency tree.
 
 On Fri, Jan 23, 2015 at 1:53 PM, ey-chih chow eyc...@hotmail.com wrote:
  Thanks.  But I think I already mark all the Spark and Hadoop reps as
  provided.  Why the cluster's version is not used?
 
  Any way, as I mentioned in the previous message, after changing the
  hadoop-client to version 1.2.1 in my maven deps, I already pass the
  exception and go to another one as indicated below.  Any suggestion on this?
 
  =
 
  Exception in thread main java.lang.reflect.InvocationTargetException
  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
  at
  sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
  at
  sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  at java.lang.reflect.Method.invoke(Method.java:606)
  at
  org.apache.spark.deploy.worker.DriverWrapper$.main(DriverWrapper.scala:40)
  at org.apache.spark.deploy.worker.DriverWrapper.main(DriverWrapper.scala)
  Caused by: java.lang.IncompatibleClassChangeError: Implementing class
  at java.lang.ClassLoader.defineClass1(Native Method)
  at java.lang.ClassLoader.defineClass(ClassLoader.java:800)
  at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
  at java.net.URLClassLoader.defineClass(URLClassLoader.java:449)
  at java.net.URLClassLoader.access$100(URLClassLoader.java:71)
  at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
  at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
  at java.security.AccessController.doPrivileged(Native Method)
  at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
  at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
  at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
  at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
  at java.lang.Class.forName0(Native Method)
  at java.lang.Class.forName(Class.java:191)
  at
  org.apache.hadoop.mapreduce.SparkHadoopMapReduceUtil$class.firstAvailableClass(SparkHadoopMapReduceUtil.scala:73)
  at
  org.apache.hadoop.mapreduce.SparkHadoopMapReduceUtil$class.newTaskAttemptContext(SparkHadoopMapReduceUtil.scala:35)
  at
  org.apache.spark.rdd.PairRDDFunctions.newTaskAttemptContext(PairRDDFunctions.scala:53)
  at
  org.apache.spark.rdd.PairRDDFunctions.saveAsNewAPIHadoopDataset(PairRDDFunctions.scala:932

RE: spark 1.1.0 save data to hdfs failed

2015-01-23 Thread ey-chih chow

After I changed the dependency to the following:
dependency
 groupIdorg.apache.hadoop/groupId
 artifactIdhadoop-client/artifactId
 version1.2.1/version
 exclusions
   exclusion
 groupIdorg.mortbay.jetty/groupId
 artifactIdservlet-api/artifactId
   /exclusion
   exclusion
 groupIdjavax.servlet/groupId
 artifactIdservlet-api/artifactId
   /exclusion
   exclusion
 groupIdio.netty/groupId
 artifactIdnetty/artifactId
   /exclusion
 /exclusions
/dependency
I got the following error.  Any idea on this?  Thanks.
===Caused by: java.lang.IncompatibleClassChangeError: Implementing 
class
at java.lang.ClassLoader.defineClass1(Native Method)
at java.lang.ClassLoader.defineClass(ClassLoader.java:800)
at 
java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
at java.net.URLClassLoader.defineClass(URLClassLoader.java:449)
at java.net.URLClassLoader.access$100(URLClassLoader.java:71)
at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:191)
at 
org.apache.hadoop.mapreduce.SparkHadoopMapReduceUtil$class.firstAvailableClass(SparkHadoopMapReduceUtil.scala:73)
at 
org.apache.hadoop.mapreduce.SparkHadoopMapReduceUtil$class.newTaskAttemptContext(SparkHadoopMapReduceUtil.scala:35)
at 
org.apache.spark.rdd.PairRDDFunctions.newTaskAttemptContext(PairRDDFunctions.scala:53)
at 
org.apache.spark.rdd.PairRDDFunctions.saveAsNewAPIHadoopDataset(PairRDDFunctions.scala:932)
at 
org.apache.spark.rdd.PairRDDFunctions.saveAsNewAPIHadoopFile(PairRDDFunctions.scala:832)
at com.crowdstar.etl.ParseAndClean$.main(ParseAndClean.scala:103)
at com.crowdstar.etl.ParseAndClean.main(ParseAndClean.scala)
... 6 moreFrom: eyc...@hotmail.com
To: so...@cloudera.com
CC: yuzhih...@gmail.com; user@spark.apache.org
Subject: RE: spark 1.1.0 save data to hdfs failed
Date: Thu, 22 Jan 2015 17:05:26 -0800




Thanks.  But after I replace the maven dependence from
dependency 
groupIdorg.apache.hadoop/groupId 
artifactIdhadoop-client/artifactId 
version2.5.0-cdh5.2.0/version 
scopeprovided/scope exclusions
   exclusion 
groupIdorg.mortbay.jetty/groupId 
artifactIdservlet-api/artifactId   /exclusion 
  exclusion 
groupIdjavax.servlet/groupId 
artifactIdservlet-api/artifactId   /exclusion 
  exclusion 
groupIdio.netty/groupId 
artifactIdnetty/artifactId   /exclusion   
  /exclusions/dependency
todependency
 groupIdorg.apache.hadoop/groupId
 artifactIdhadoop-client/artifactId
 version1.0.4/version
 scopeprovided/scope
 exclusions
   exclusion
 groupIdorg.mortbay.jetty/groupId
 artifactIdservlet-api/artifactId
   /exclusion
   exclusion
 groupIdjavax.servlet/groupId
 artifactIdservlet-api/artifactId
   /exclusion
   exclusion
 groupIdio.netty/groupId
 artifactIdnetty/artifactId
   /exclusion
 /exclusions
/dependency
the warning message is still shown up in the namenode log.  Is there any other 
thing I need to do?


Thanks.


Ey-Chih Chow


 From: so...@cloudera.com
 Date: Thu, 22 Jan 2015 22:34:22 +
 Subject: Re

RE: spark 1.1.0 save data to hdfs failed

2015-01-23 Thread ey-chih chow

I looked into the source code of SparkHadoopMapReduceUtil.scala. I think it is 
broken in the following code:
  def newTaskAttemptContext(conf: Configuration, attemptId: TaskAttemptID): 
TaskAttemptContext = {val klass = firstAvailableClass(
org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl,  // hadoop2, 
hadoop2-yarnorg.apache.hadoop.mapreduce.TaskAttemptContext)   
// hadoop1val ctor = klass.getDeclaredConstructor(classOf[Configuration], 
classOf[TaskAttemptID])ctor.newInstance(conf, 
attemptId).asInstanceOf[TaskAttemptContext]  }
In other words, it is related to hadoop2, hadoop2-yarn, and hadoop1.  Any 
suggestion how to resolve it?
Thanks.


 From: so...@cloudera.com
 Date: Fri, 23 Jan 2015 14:01:45 +
 Subject: Re: spark 1.1.0 save data to hdfs failed
 To: eyc...@hotmail.com
 CC: user@spark.apache.org
 
 These are all definitely symptoms of mixing incompatible versions of 
 libraries.
 
 I'm not suggesting you haven't excluded Spark / Hadoop, but, this is
 not the only way Hadoop deps get into your app. See my suggestion
 about investigating the dependency tree.
 
 On Fri, Jan 23, 2015 at 1:53 PM, ey-chih chow eyc...@hotmail.com wrote:
  Thanks.  But I think I already mark all the Spark and Hadoop reps as
  provided.  Why the cluster's version is not used?
 
  Any way, as I mentioned in the previous message, after changing the
  hadoop-client to version 1.2.1 in my maven deps, I already pass the
  exception and go to another one as indicated below.  Any suggestion on this?
 
  =
 
  Exception in thread main java.lang.reflect.InvocationTargetException
  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
  at
  sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
  at
  sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  at java.lang.reflect.Method.invoke(Method.java:606)
  at
  org.apache.spark.deploy.worker.DriverWrapper$.main(DriverWrapper.scala:40)
  at org.apache.spark.deploy.worker.DriverWrapper.main(DriverWrapper.scala)
  Caused by: java.lang.IncompatibleClassChangeError: Implementing class
  at java.lang.ClassLoader.defineClass1(Native Method)
  at java.lang.ClassLoader.defineClass(ClassLoader.java:800)
  at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
  at java.net.URLClassLoader.defineClass(URLClassLoader.java:449)
  at java.net.URLClassLoader.access$100(URLClassLoader.java:71)
  at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
  at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
  at java.security.AccessController.doPrivileged(Native Method)
  at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
  at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
  at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
  at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
  at java.lang.Class.forName0(Native Method)
  at java.lang.Class.forName(Class.java:191)
  at
  org.apache.hadoop.mapreduce.SparkHadoopMapReduceUtil$class.firstAvailableClass(SparkHadoopMapReduceUtil.scala:73)
  at
  org.apache.hadoop.mapreduce.SparkHadoopMapReduceUtil$class.newTaskAttemptContext(SparkHadoopMapReduceUtil.scala:35)
  at
  org.apache.spark.rdd.PairRDDFunctions.newTaskAttemptContext(PairRDDFunctions.scala:53)
  at
  org.apache.spark.rdd.PairRDDFunctions.saveAsNewAPIHadoopDataset(PairRDDFunctions.scala:932)
  at
  org.apache.spark.rdd.PairRDDFunctions.saveAsNewAPIHadoopFile(PairRDDFunctions.scala:832)
  at com.crowdstar.etl.ParseAndClean$.main(ParseAndClean.scala:103)
  at com.crowdstar.etl.ParseAndClean.main(ParseAndClean.scala)
 
  ... 6 more

RE: spark 1.1.0 save data to hdfs failed

2015-01-23 Thread ey-chih chow

I also think the code is not robust enough.  First, Spark works with hadoop1, 
why the code try hadoop2 first.  Also the following code only handle 
ClassNotFoundException.  It should handle all the exceptions.
  private def firstAvailableClass(first: String, second: String): Class[_] = {  
  try {  Class.forName(first)} catch {  case e: 
ClassNotFoundException =Class.forName(second)}  }
From: eyc...@hotmail.com
To: so...@cloudera.com
CC: user@spark.apache.org
Subject: RE: spark 1.1.0 save data to hdfs failed
Date: Fri, 23 Jan 2015 06:43:00 -0800

I looked into the source code of SparkHadoopMapReduceUtil.scala. I think it is 
broken in the following code:
  def newTaskAttemptContext(conf: Configuration, attemptId: TaskAttemptID): 
TaskAttemptContext = {val klass = firstAvailableClass(
org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl,  // hadoop2, 
hadoop2-yarnorg.apache.hadoop.mapreduce.TaskAttemptContext)   
// hadoop1val ctor = klass.getDeclaredConstructor(classOf[Configuration], 
classOf[TaskAttemptID])ctor.newInstance(conf, 
attemptId).asInstanceOf[TaskAttemptContext]  }
In other words, it is related to hadoop2, hadoop2-yarn, and hadoop1.  Any 
suggestion how to resolve it?
Thanks.

 From: so...@cloudera.com
 Date: Fri, 23 Jan 2015 14:01:45 +
 Subject: Re: spark 1.1.0 save data to hdfs failed
 To: eyc...@hotmail.com
 CC: user@spark.apache.org

 These are all definitely symptoms of mixing incompatible versions of 
 libraries.

 I'm not suggesting you haven't excluded Spark / Hadoop, but, this is
 not the only way Hadoop deps get into your app. See my suggestion
 about investigating the dependency tree.

 On Fri, Jan 23, 2015 at 1:53 PM, ey-chih chow eyc...@hotmail.com wrote:
  Thanks.  But I think I already mark all the Spark and Hadoop reps as
  provided.  Why the cluster's version is not used?

  Any way, as I mentioned in the previous message, after changing the
  hadoop-client to version 1.2.1 in my maven deps, I already pass the
  exception and go to another one as indicated below.  Any suggestion on this?

  =

  Exception in thread main java.lang.reflect.InvocationTargetException
  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
  at
  sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
  at
  sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  at java.lang.reflect.Method.invoke(Method.java:606)
  at
  org.apache.spark.deploy.worker.DriverWrapper$.main(DriverWrapper.scala:40)
  at org.apache.spark.deploy.worker.DriverWrapper.main(DriverWrapper.scala)
  Caused by: java.lang.IncompatibleClassChangeError: Implementing class
  at java.lang.ClassLoader.defineClass1(Native Method)
  at java.lang.ClassLoader.defineClass(ClassLoader.java:800)
  at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
  at java.net.URLClassLoader.defineClass(URLClassLoader.java:449)
  at java.net.URLClassLoader.access$100(URLClassLoader.java:71)
  at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
  at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
  at java.security.AccessController.doPrivileged(Native Method)
  at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
  at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
  at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
  at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
  at java.lang.Class.forName0(Native Method)
  at java.lang.Class.forName(Class.java:191)
  at
  org.apache.hadoop.mapreduce.SparkHadoopMapReduceUtil$class.firstAvailableClass(SparkHadoopMapReduceUtil.scala:73)
  at
  org.apache.hadoop.mapreduce.SparkHadoopMapReduceUtil$class.newTaskAttemptContext(SparkHadoopMapReduceUtil.scala:35)
  at
  org.apache.spark.rdd.PairRDDFunctions.newTaskAttemptContext(PairRDDFunctions.scala:53)
  at
  org.apache.spark.rdd.PairRDDFunctions.saveAsNewAPIHadoopDataset(PairRDDFunctions.scala:932)
  at
  org.apache.spark.rdd.PairRDDFunctions.saveAsNewAPIHadoopFile(PairRDDFunctions.scala:832)
  at com.crowdstar.etl.ParseAndClean$.main(ParseAndClean.scala:103)
  at com.crowdstar.etl.ParseAndClean.main(ParseAndClean.scala)

  ... 6 more

RE: spark 1.1.0 save data to hdfs failed

2015-01-22 Thread ey-chih chow

I looked into the namenode log and found this message:
2015-01-22 22:18:39,441 WARN org.apache.hadoop.ipc.Server: Incorrect header or 
version mismatch from 10.33.140.233:53776 got version 9 expected version 4
What should I do to fix this?
Thanks.
Ey-Chih
From: eyc...@hotmail.com
To: yuzhih...@gmail.com
CC: user@spark.apache.org
Subject: RE: spark 1.1.0 save data to hdfs failed
Date: Wed, 21 Jan 2015 23:12:56 -0800




The hdfs release should be hadoop 1.0.4.
Ey-Chih Chow 

Date: Wed, 21 Jan 2015 16:56:25 -0800
Subject: Re: spark 1.1.0 save data to hdfs failed
From: yuzhih...@gmail.com
To: eyc...@hotmail.com
CC: user@spark.apache.org

What hdfs release are you using ?
Can you check namenode log around time of error below to see if there is some 
clue ?
Cheers
On Wed, Jan 21, 2015 at 4:51 PM, ey-chih chow eyc...@hotmail.com wrote:
Hi,



I used the following fragment of a scala program to save data to hdfs:



contextAwareEvents

.map(e = (new AvroKey(e), null))

.saveAsNewAPIHadoopFile(hdfs:// + masterHostname + :9000/ETL/output/

+ dateDir,

classOf[AvroKey[GenericRecord]],

classOf[NullWritable],

classOf[AvroKeyOutputFormat[GenericRecord]],

job.getConfiguration)



But it failed with the following error messages.  Is there any people who

can help?  Thanks.



Ey-Chih Chow



=



Exception in thread main java.lang.reflect.InvocationTargetException

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

at

sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)

at

sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

at java.lang.reflect.Method.invoke(Method.java:606)

at

org.apache.spark.deploy.worker.DriverWrapper$.main(DriverWrapper.scala:40)

at 
org.apache.spark.deploy.worker.DriverWrapper.main(DriverWrapper.scala)

Caused by: java.io.IOException: Failed on local exception:

java.io.EOFException; Host Details : local host is:

ip-10-33-140-157/10.33.140.157; destination host is:

ec2-54-203-58-2.us-west-2.compute.amazonaws.com:9000;

at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:764)

at org.apache.hadoop.ipc.Client.call(Client.java:1415)

at org.apache.hadoop.ipc.Client.call(Client.java:1364)

at

org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)

at com.sun.proxy.$Proxy14.getFileInfo(Unknown Source)

at

org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:744)

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

at

sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)

at

sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

at java.lang.reflect.Method.invoke(Method.java:606)

at

org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)

at

org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)

at com.sun.proxy.$Proxy15.getFileInfo(Unknown Source)

at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1925)

at

org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1079)

at

org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1075)

at

org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)

at

org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1075)

at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1400)

at

org.apache.hadoop.mapreduce.lib.output.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:145)

at

org.apache.spark.rdd.PairRDDFunctions.saveAsNewAPIHadoopDataset(PairRDDFunctions.scala:900)

at

org.apache.spark.rdd.PairRDDFunctions.saveAsNewAPIHadoopFile(PairRDDFunctions.scala:832)

at com.crowdstar.etl.ParseAndClean$.main(ParseAndClean.scala:101)

at com.crowdstar.etl.ParseAndClean.main(ParseAndClean.scala)

... 6 more

Caused by: java.io.EOFException

at java.io.DataInputStream.readInt(DataInputStream.java:392)

at

org.apache.hadoop.ipc.Client$Connection.receiveRpcResponse(Client.java:1055)

at org.apache.hadoop.ipc.Client$Connection.run(Client.java:950)



===











--

View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/spark-1-1-0-save-data-to-hdfs-failed-tp21305.html

Sent from the Apache Spark User List mailing list archive at Nabble.com

RE: spark 1.1.0 save data to hdfs failed

2015-01-22 Thread ey-chih chow

Thanks.  But after I replace the maven dependence from
dependency 
groupIdorg.apache.hadoop/groupId 
artifactIdhadoop-client/artifactId 
version2.5.0-cdh5.2.0/version 
scopeprovided/scope exclusions
   exclusion 
groupIdorg.mortbay.jetty/groupId 
artifactIdservlet-api/artifactId   /exclusion 
  exclusion 
groupIdjavax.servlet/groupId 
artifactIdservlet-api/artifactId   /exclusion 
  exclusion 
groupIdio.netty/groupId 
artifactIdnetty/artifactId   /exclusion   
  /exclusions/dependency
todependency
 groupIdorg.apache.hadoop/groupId
 artifactIdhadoop-client/artifactId
 version1.0.4/version
 scopeprovided/scope
 exclusions
   exclusion
 groupIdorg.mortbay.jetty/groupId
 artifactIdservlet-api/artifactId
   /exclusion
   exclusion
 groupIdjavax.servlet/groupId
 artifactIdservlet-api/artifactId
   /exclusion
   exclusion
 groupIdio.netty/groupId
 artifactIdnetty/artifactId
   /exclusion
 /exclusions
/dependency
the warning message is still shown up in the namenode log.  Is there any other 
thing I need to do?


Thanks.


Ey-Chih Chow


 From: so...@cloudera.com
 Date: Thu, 22 Jan 2015 22:34:22 +
 Subject: Re: spark 1.1.0 save data to hdfs failed
 To: eyc...@hotmail.com
 CC: yuzhih...@gmail.com; user@spark.apache.org
 
 It means your client app is using Hadoop 2.x and your HDFS is Hadoop 1.x.
 
 On Thu, Jan 22, 2015 at 10:32 PM, ey-chih chow eyc...@hotmail.com wrote:
  I looked into the namenode log and found this message:
 
  2015-01-22 22:18:39,441 WARN org.apache.hadoop.ipc.Server: Incorrect header
  or version mismatch from 10.33.140.233:53776 got version 9 expected version
  4
 
  What should I do to fix this?
 
  Thanks.
 
  Ey-Chih
 
  
  From: eyc...@hotmail.com
  To: yuzhih...@gmail.com
  CC: user@spark.apache.org
  Subject: RE: spark 1.1.0 save data to hdfs failed
  Date: Wed, 21 Jan 2015 23:12:56 -0800
 
  The hdfs release should be hadoop 1.0.4.
 
  Ey-Chih Chow
 
  
  Date: Wed, 21 Jan 2015 16:56:25 -0800
  Subject: Re: spark 1.1.0 save data to hdfs failed
  From: yuzhih...@gmail.com
  To: eyc...@hotmail.com
  CC: user@spark.apache.org
 
  What hdfs release are you using ?
 
  Can you check namenode log around time of error below to see if there is
  some clue ?
 
  Cheers
 
  On Wed, Jan 21, 2015 at 4:51 PM, ey-chih chow eyc...@hotmail.com wrote:
 
  Hi,
 
  I used the following fragment of a scala program to save data to hdfs:
 
  contextAwareEvents
  .map(e = (new AvroKey(e), null))
  .saveAsNewAPIHadoopFile(hdfs:// + masterHostname + :9000/ETL/output/
  + dateDir,
  classOf[AvroKey[GenericRecord]],
  classOf[NullWritable],
  classOf[AvroKeyOutputFormat[GenericRecord]],
  job.getConfiguration)
 
  But it failed with the following error messages.  Is there any people who
  can help?  Thanks.
 
  Ey-Chih Chow
 
  =
 
  Exception in thread main java.lang.reflect.InvocationTargetException
  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
  at
  sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
  at
  sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  at java.lang.reflect.Method.invoke(Method.java:606)
  at
  org.apache.spark.deploy.worker.DriverWrapper$.main(DriverWrapper.scala:40)
  at
  org.apache.spark.deploy.worker.DriverWrapper.main(DriverWrapper.scala)
  Caused by: java.io.IOException: Failed on local exception:
  java.io.EOFException; Host Details : local host is:
  ip-10-33-140-157/10.33.140.157; destination host is:
  ec2-54-203-58-2.us-west-2.compute.amazonaws.com:9000;
  at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:764)
  at org.apache.hadoop.ipc.Client.call(Client.java:1415)
  at org.apache.hadoop.ipc.Client.call(Client.java:1364

RE: Spark 1.1.0 - spark-submit failed

2015-01-21 Thread ey-chih chow

Thanks for help.  I added the following dependency in my pom file and the 
problem went away.
dependency !-- default Netty --
  groupIdio.netty/groupId
  artifactIdnetty/artifactId
  version3.6.6.Final/version
/dependency
Ey-Chih
Date: Tue, 20 Jan 2015 16:57:20 -0800 Subject: Re: Spark 1.1.0 - spark-submit 
failedFrom: yuzhih...@gmail.com
To: eyc...@hotmail.com
CC: user@spark.apache.org

Please check which netty jar(s) are on the classpath.
NioWorkerPool(Executor workerExecutor, int workerCount) was added in netty 3.5.4

Cheers
On Tue, Jan 20, 2015 at 4:15 PM, ey-chih chow eyc...@hotmail.com wrote:
Hi,



I issued the following command in a ec2 cluster launched using spark-ec2:



~/spark/bin/spark-submit --class com.crowdstar.cluster.etl.ParseAndClean

--master spark://ec2-54-185-107-113.us-west-2.compute.amazonaws.com:7077

--deploy-mode cluster --total-executor-cores 4

file:///tmp/etl-admin/jar/spark-etl-0.0.1-SNAPSHOT.jar

/ETL/input/2015/01/10/12/10Jan2015.avro

file:///tmp/etl-admin/vertica/VERTICA.avdl

file:///tmp/etl-admin/vertica/extras.json

file:///tmp/etl-admin/jar/spark-etl-0.0.1-SNAPSHOT.jar



The command failed with the following error logs in Spark-UI.  Is there any

suggestion on how to fix the problem?  Thanks.



Ey-Chih Chow



==



Launch Command: /usr/lib/jvm/java-1.7.0/bin/java -cp

/root/spark/work/driver-20150120200843-/spark-etl-0.0.1-SNAPSHOT.jar/root/ephemeral-hdfs/conf:/root/spark/conf:/root/spark/lib/spark-assembly-1.1.0-hadoop1.0.4.jar:/root/spark/lib/datanucleus-api-jdo-3.2.1.jar:/root/spark/lib/datanucleus-core-3.2.2.jar:/root/spark/lib/datanucleus-rdbms-3.2.1.jar

-XX:MaxPermSize=128m

-Dspark.executor.extraLibraryPath=/root/ephemeral-hdfs/lib/native/

-Dspark.executor.memory=13000m -Dspark.akka.askTimeout=10

-Dspark.cores.max=4

-Dspark.app.name=com.crowdstar.cluster.etl.ParseAndClean

-Dspark.jars=file:///tmp/etl-admin/jar/spark-etl-0.0.1-SNAPSHOT.jar

-Dspark.executor.extraClassPath=/root/ephemeral-hdfs/conf

-Dspark.master=spark://ec2-54-203-58-2.us-west-2.compute.amazonaws.com:7077

-Dakka.loglevel=WARNING -Xms512M -Xmx512M

org.apache.spark.deploy.worker.DriverWrapper

akka.tcp://sparkwor...@ip-10-33-140-157.us-west-2.compute.internal:47585/user/Worker

com.crowdstar.cluster.etl.ParseAndClean

/ETL/input/2015/01/10/12/10Jan2015.avro

file:///tmp/etl-admin/vertica/VERTICA.avdl

file:///tmp/etl-admin/vertica/extras.json

file:///tmp/etl-admin/jar/spark-etl-0.0.1-SNAPSHOT.jar





SLF4J: Class path contains multiple SLF4J bindings.

SLF4J: Found binding in

[jar:file:/root/spark/work/driver-20150120200843-/spark-etl-0.0.1-SNAPSHOT.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: Found binding in

[jar:file:/root/spark/lib/spark-assembly-1.1.0-hadoop1.0.4.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an

explanation.

SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]

15/01/20 20:08:45 INFO spark.SecurityManager: Changing view acls to: root,

15/01/20 20:08:45 INFO spark.SecurityManager: Changing modify acls to: root,

15/01/20 20:08:45 INFO spark.SecurityManager: SecurityManager:

authentication disabled; ui acls disabled; users with view permissions:

Set(root, ); users with modify permissions: Set(root, )

15/01/20 20:08:45 INFO slf4j.Slf4jLogger: Slf4jLogger started

15/01/20 20:08:45 ERROR actor.ActorSystemImpl: Uncaught fatal error from

thread [Driver-akka.actor.default-dispatcher-3] shutting down ActorSystem

[Driver]

java.lang.NoSuchMethodError:

org.jboss.netty.channel.socket.nio.NioWorkerPool.init(Ljava/util/concurrent/Executor;I)V

at

akka.remote.transport.netty.NettyTransport.init(NettyTransport.scala:282)

at

akka.remote.transport.netty.NettyTransport.init(NettyTransport.scala:239)

at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)

at

sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)

at

sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)

at java.lang.reflect.Constructor.newInstance(Constructor.java:526)

at

akka.actor.ReflectiveDynamicAccess$$anonfun$createInstanceFor$2.apply(DynamicAccess.scala:78)

at scala.util.Try$.apply(Try.scala:161)

at

akka.actor.ReflectiveDynamicAccess.createInstanceFor(DynamicAccess.scala:73)

at

akka.actor.ReflectiveDynamicAccess$$anonfun$createInstanceFor$3.apply(DynamicAccess.scala:84)

at

akka.actor.ReflectiveDynamicAccess$$anonfun$createInstanceFor$3.apply(DynamicAccess.scala:84)

at scala.util.Success.flatMap(Try.scala:200)

at

akka.actor.ReflectiveDynamicAccess.createInstanceFor(DynamicAccess.scala:84

spark 1.1.0 save data to hdfs failed

2015-01-21 Thread ey-chih chow

Hi,  

I used the following fragment of a scala program to save data to hdfs:

contextAwareEvents
.map(e = (new AvroKey(e), null))
.saveAsNewAPIHadoopFile(hdfs:// + masterHostname + :9000/ETL/output/
+ dateDir,
classOf[AvroKey[GenericRecord]],
classOf[NullWritable],
classOf[AvroKeyOutputFormat[GenericRecord]],
job.getConfiguration)

But it failed with the following error messages.  Is there any people who
can help?  Thanks.

Ey-Chih Chow

=

Exception in thread main java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at
org.apache.spark.deploy.worker.DriverWrapper$.main(DriverWrapper.scala:40)
at 
org.apache.spark.deploy.worker.DriverWrapper.main(DriverWrapper.scala)
Caused by: java.io.IOException: Failed on local exception:
java.io.EOFException; Host Details : local host is:
ip-10-33-140-157/10.33.140.157; destination host is:
ec2-54-203-58-2.us-west-2.compute.amazonaws.com:9000; 
at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:764)
at org.apache.hadoop.ipc.Client.call(Client.java:1415)
at org.apache.hadoop.ipc.Client.call(Client.java:1364)
at
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
at com.sun.proxy.$Proxy14.getFileInfo(Unknown Source)
at
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:744)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
at
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
at com.sun.proxy.$Proxy15.getFileInfo(Unknown Source)
at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1925)
at
org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1079)
at
org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1075)
at
org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at
org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1075)
at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1400)
at
org.apache.hadoop.mapreduce.lib.output.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:145)
at
org.apache.spark.rdd.PairRDDFunctions.saveAsNewAPIHadoopDataset(PairRDDFunctions.scala:900)
at
org.apache.spark.rdd.PairRDDFunctions.saveAsNewAPIHadoopFile(PairRDDFunctions.scala:832)
at com.crowdstar.etl.ParseAndClean$.main(ParseAndClean.scala:101)
at com.crowdstar.etl.ParseAndClean.main(ParseAndClean.scala)
... 6 more
Caused by: java.io.EOFException
at java.io.DataInputStream.readInt(DataInputStream.java:392)
at
org.apache.hadoop.ipc.Client$Connection.receiveRpcResponse(Client.java:1055)
at org.apache.hadoop.ipc.Client$Connection.run(Client.java:950)

===





--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/spark-1-1-0-save-data-to-hdfs-failed-tp21305.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

RE: spark 1.1.0 save data to hdfs failed

2015-01-21 Thread ey-chih chow

The hdfs release should be hadoop 1.0.4.
Ey-Chih Chow 

Date: Wed, 21 Jan 2015 16:56:25 -0800
Subject: Re: spark 1.1.0 save data to hdfs failed
From: yuzhih...@gmail.com
To: eyc...@hotmail.com
CC: user@spark.apache.org

What hdfs release are you using ?
Can you check namenode log around time of error below to see if there is some 
clue ?
Cheers
On Wed, Jan 21, 2015 at 4:51 PM, ey-chih chow eyc...@hotmail.com wrote:
Hi,



I used the following fragment of a scala program to save data to hdfs:



contextAwareEvents

.map(e = (new AvroKey(e), null))

.saveAsNewAPIHadoopFile(hdfs:// + masterHostname + :9000/ETL/output/

+ dateDir,

classOf[AvroKey[GenericRecord]],

classOf[NullWritable],

classOf[AvroKeyOutputFormat[GenericRecord]],

job.getConfiguration)



But it failed with the following error messages.  Is there any people who

can help?  Thanks.



Ey-Chih Chow



=



Exception in thread main java.lang.reflect.InvocationTargetException

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

at

sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)

at

sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

at java.lang.reflect.Method.invoke(Method.java:606)

at

org.apache.spark.deploy.worker.DriverWrapper$.main(DriverWrapper.scala:40)

at 
org.apache.spark.deploy.worker.DriverWrapper.main(DriverWrapper.scala)

Caused by: java.io.IOException: Failed on local exception:

java.io.EOFException; Host Details : local host is:

ip-10-33-140-157/10.33.140.157; destination host is:

ec2-54-203-58-2.us-west-2.compute.amazonaws.com:9000;

at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:764)

at org.apache.hadoop.ipc.Client.call(Client.java:1415)

at org.apache.hadoop.ipc.Client.call(Client.java:1364)

at

org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)

at com.sun.proxy.$Proxy14.getFileInfo(Unknown Source)

at

org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:744)

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

at

sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)

at

sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

at java.lang.reflect.Method.invoke(Method.java:606)

at

org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)

at

org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)

at com.sun.proxy.$Proxy15.getFileInfo(Unknown Source)

at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1925)

at

org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1079)

at

org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1075)

at

org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)

at

org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1075)

at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1400)

at

org.apache.hadoop.mapreduce.lib.output.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:145)

at

org.apache.spark.rdd.PairRDDFunctions.saveAsNewAPIHadoopDataset(PairRDDFunctions.scala:900)

at

org.apache.spark.rdd.PairRDDFunctions.saveAsNewAPIHadoopFile(PairRDDFunctions.scala:832)

at com.crowdstar.etl.ParseAndClean$.main(ParseAndClean.scala:101)

at com.crowdstar.etl.ParseAndClean.main(ParseAndClean.scala)

... 6 more

Caused by: java.io.EOFException

at java.io.DataInputStream.readInt(DataInputStream.java:392)

at

org.apache.hadoop.ipc.Client$Connection.receiveRpcResponse(Client.java:1055)

at org.apache.hadoop.ipc.Client$Connection.run(Client.java:950)



===











--

View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/spark-1-1-0-save-data-to-hdfs-failed-tp21305.html

Sent from the Apache Spark User List mailing list archive at Nabble.com.



-

To unsubscribe, e-mail: user-unsubscr...@spark.apache.org

For additional commands, e-mail: user-h...@spark.apache.org

Spark 1.1.0 - spark-submit failed

2015-01-20 Thread ey-chih chow

Hi,

I issued the following command in a ec2 cluster launched using spark-ec2:

~/spark/bin/spark-submit --class com.crowdstar.cluster.etl.ParseAndClean
--master spark://ec2-54-185-107-113.us-west-2.compute.amazonaws.com:7077
--deploy-mode cluster --total-executor-cores 4
file:///tmp/etl-admin/jar/spark-etl-0.0.1-SNAPSHOT.jar
/ETL/input/2015/01/10/12/10Jan2015.avro
file:///tmp/etl-admin/vertica/VERTICA.avdl
file:///tmp/etl-admin/vertica/extras.json
file:///tmp/etl-admin/jar/spark-etl-0.0.1-SNAPSHOT.jar

The command failed with the following error logs in Spark-UI.  Is there any
suggestion on how to fix the problem?  Thanks.

Ey-Chih Chow

==

Launch Command: /usr/lib/jvm/java-1.7.0/bin/java -cp
/root/spark/work/driver-20150120200843-/spark-etl-0.0.1-SNAPSHOT.jar/root/ephemeral-hdfs/conf:/root/spark/conf:/root/spark/lib/spark-assembly-1.1.0-hadoop1.0.4.jar:/root/spark/lib/datanucleus-api-jdo-3.2.1.jar:/root/spark/lib/datanucleus-core-3.2.2.jar:/root/spark/lib/datanucleus-rdbms-3.2.1.jar
-XX:MaxPermSize=128m
-Dspark.executor.extraLibraryPath=/root/ephemeral-hdfs/lib/native/
-Dspark.executor.memory=13000m -Dspark.akka.askTimeout=10
-Dspark.cores.max=4
-Dspark.app.name=com.crowdstar.cluster.etl.ParseAndClean
-Dspark.jars=file:///tmp/etl-admin/jar/spark-etl-0.0.1-SNAPSHOT.jar
-Dspark.executor.extraClassPath=/root/ephemeral-hdfs/conf
-Dspark.master=spark://ec2-54-203-58-2.us-west-2.compute.amazonaws.com:7077
-Dakka.loglevel=WARNING -Xms512M -Xmx512M
org.apache.spark.deploy.worker.DriverWrapper
akka.tcp://sparkwor...@ip-10-33-140-157.us-west-2.compute.internal:47585/user/Worker
com.crowdstar.cluster.etl.ParseAndClean
/ETL/input/2015/01/10/12/10Jan2015.avro
file:///tmp/etl-admin/vertica/VERTICA.avdl
file:///tmp/etl-admin/vertica/extras.json
file:///tmp/etl-admin/jar/spark-etl-0.0.1-SNAPSHOT.jar


SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in
[jar:file:/root/spark/work/driver-20150120200843-/spark-etl-0.0.1-SNAPSHOT.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in
[jar:file:/root/spark/lib/spark-assembly-1.1.0-hadoop1.0.4.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an
explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
15/01/20 20:08:45 INFO spark.SecurityManager: Changing view acls to: root,
15/01/20 20:08:45 INFO spark.SecurityManager: Changing modify acls to: root,
15/01/20 20:08:45 INFO spark.SecurityManager: SecurityManager:
authentication disabled; ui acls disabled; users with view permissions:
Set(root, ); users with modify permissions: Set(root, )
15/01/20 20:08:45 INFO slf4j.Slf4jLogger: Slf4jLogger started
15/01/20 20:08:45 ERROR actor.ActorSystemImpl: Uncaught fatal error from
thread [Driver-akka.actor.default-dispatcher-3] shutting down ActorSystem
[Driver]
java.lang.NoSuchMethodError:
org.jboss.netty.channel.socket.nio.NioWorkerPool.init(Ljava/util/concurrent/Executor;I)V
at
akka.remote.transport.netty.NettyTransport.init(NettyTransport.scala:282)
at
akka.remote.transport.netty.NettyTransport.init(NettyTransport.scala:239)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
at
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
at
akka.actor.ReflectiveDynamicAccess$$anonfun$createInstanceFor$2.apply(DynamicAccess.scala:78)
at scala.util.Try$.apply(Try.scala:161)
at
akka.actor.ReflectiveDynamicAccess.createInstanceFor(DynamicAccess.scala:73)
at
akka.actor.ReflectiveDynamicAccess$$anonfun$createInstanceFor$3.apply(DynamicAccess.scala:84)
at
akka.actor.ReflectiveDynamicAccess$$anonfun$createInstanceFor$3.apply(DynamicAccess.scala:84)
at scala.util.Success.flatMap(Try.scala:200)
at
akka.actor.ReflectiveDynamicAccess.createInstanceFor(DynamicAccess.scala:84)
at akka.remote.EndpointManager$$anonfun$8.apply(Remoting.scala:618)
at akka.remote.EndpointManager$$anonfun$8.apply(Remoting.scala:610)
at
scala.collection.TraversableLike$WithFilter$$anonfun$map$2.apply(TraversableLike.scala:722)
at scala.collection.Iterator$class.foreach(Iterator.scala:727)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
at
scala.collection.TraversableLike$WithFilter.map(TraversableLike.scala:721)
at
akka.remote.EndpointManager.akka$remote$EndpointManager$$listens(Remoting.scala:610)
at
akka.remote.EndpointManager$$anonfun$receive$2.applyOrElse

serialization issue with mapPartitions

2014-12-25 Thread ey-chih chow

Hi,

I got some issues with mapPartitions with the following piece of code:

val sessions = sc
  .newAPIHadoopFile(
... path to an avro file ...,
classOf[org.apache.avro.mapreduce.AvroKeyInputFormat[ByteBuffer]],
classOf[AvroKey[ByteBuffer]],
classOf[NullWritable],
job.getConfiguration())
  .mapPartitions { valueIterator =
val config = job.getConfiguration()
 .
 .
 .
  }
  .collect()

Why job.getConfiguration() in the function mapPartitions will generate the
following message?

Cause: java.io.NotSerializableException: org.apache.hadoop.mapreduce.Job

If I take out 'val config = job.getConfiguration()' in the mapPartitions,
the code works fine, even through 
job.getConfiguration() shows up also in newAPIHadoopFile().

Ey-Chih Chow



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/serialization-issue-with-mapPartitions-tp20858.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: serialization issue with mapPartitions

2014-12-25 Thread ey-chih chow

I should rephrase my question as follows:

How to use the corresponding Hadoop Configuration of a HadoopRDD in defining
a function as an input parameter to the MapPartitions function?

Thanks.

Ey-Chih Chow



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/serialization-issue-with-mapPartitions-tp20858p20861.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Debugging a Spark application using Eclipse throws SecurityException

2014-12-23 Thread ey-chih chow

I am using Eclipse to develop a Spark application (using Spark 1.1.0). I use
the ScalaTest framework to test the application. But I was blocked by 
the following exception:


java.lang.SecurityException: class javax.servlet.FilterRegistration's
signer 
information does not match signer information of other classes in the same 
package


I use Maven for the project. The Maven dependencies are as follows:

dependencies
dependency 
groupIdorg.apache.spark/groupId
artifactIdspark-core_2.10/artifactId
version1.1.0-cdh5.2.0/version
scopeprovided/scope
/dependency
dependency 
groupIdorg.apache.spark/groupId
artifactIdspark-sql_2.10/artifactId
version1.1.0-cdh5.2.0/version
scopeprovided/scope
/dependency
dependency
groupIdorg.apache.hadoop/groupId
artifactIdhadoop-client/artifactId
version2.5.0-cdh5.2.0/version
scopeprovided/scope
exclusions
exclusion
groupIdorg.mortbay.jetty/groupId
artifactIdservlet-api-2.5/artifactId
/exclusion
exclusion
groupIdorg.mortbay.jetty/groupId
artifactIdservlet-api/artifactId
/exclusion
exclusion
groupIdjavax.servlet/groupId
artifactIdservlet-api/artifactId
/exclusion
/exclusions
/dependency
dependency
groupIdorg.apache.avro/groupId
artifactIdavro/artifactId
version1.7.6-cdh5.2.0/version
/dependency
dependency
groupIdorg.apache.avro/groupId
artifactIdavro-tools/artifactId
version1.7.6-cdh5.2.0/version
/dependency
dependency
groupIdcom.twitter/groupId
artifactIdparquet-avro/artifactId
version1.5.0-cdh5.2.0/version
/dependency
dependency
groupIdcom.cloudera.cdk/groupId
artifactIdcdk-data-core/artifactId
version0.9.2/version
scopeprovided/scope
/dependency
dependency
groupIdorg.scalatest/groupId
artifactIdscalatest_2.10/artifactId
version2.2.1/version
scopetest/scope
/dependency
dependency
groupIdorg.apache.avro/groupId
artifactIdavro-mapred/artifactId
version1.7.6-cdh5.2.0/version
scopeprovided/scope
exclusions
exclusion
groupIdorg.mortbay.jetty/groupId
artifactIdservlet-api/artifactId
/exclusion
/exclusions
/dependency
dependency
groupIdorg.apache.hbase/groupId
artifactIdhbase-server/artifactId
version0.98.6-cdh5.2.0/version
scopeprovided/scope
exclusions
exclusion
groupIdjavax.servlet/groupId
artifactId*/artifactId
/exclusion
exclusion
groupIdorg.mortbay.jetty/groupId
artifactIdservlet-api-2.5/artifactId
/exclusion
exclusion
groupIdjavax.servlet.jsp/groupId
artifactIdjsp-api/artifactId
/exclusion
/exclusions
/dependency
dependency
groupIdorg.apache.hbase/groupId
artifactIdhbase-client/artifactId
version0.98.6-cdh5.2.0/version
scopeprovided/scope
exclusions
exclusion
groupIdjavax.servlet/groupId
artifactId*/artifactId
/exclusion
/exclusions
/dependency
/dependencies


I appreciate if somebody can help me to identify the issue.


Best regards,

Ey-Chih Chow



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Debugging-a-Spark-application-using-Eclipse-throws-SecurityException-tp20843.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: Debugging a Spark application using Eclipse throws SecurityException

2014-12-23 Thread ey-chih chow

It's working now.  Probably I didn't specify the excluded list correctly.  I
kept revising it and now it's working.  Thanks.


Ey-Chih Chow



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Debugging-a-Spark-application-using-Eclipse-throws-SecurityException-tp20843p20844.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

38 matches

Mail list logo