java.lang.ClassNotFoundException
Hi, I have a little spark program and i am getting an error why i dont understand. My code is https://gist.github.com/yaseminn/522a75b863ad78934bc3. I am using spark 1.3 Submitting : bin/spark-submit --class MonthlyAverage --master local[4] weather.jar error: ~/spark-1.3.1-bin-hadoop2.4$ bin/spark-submit --class MonthlyAverage --master local[4] weather.jar java.lang.ClassNotFoundException: MonthlyAverage at java.net.URLClassLoader$1.run(URLClassLoader.java:366) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:425) at java.lang.ClassLoader.loadClass(ClassLoader.java:358) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:274) at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:538) at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:166) at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:189) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:110) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties Please help me Asap.. yasemin -- hiç ender hiç
Re: Spark on YARN
Hi Jem, Do they fail with any particular exception? Does YARN just never end up giving them resources? Does an application master start? If so, what are in its logs? If not, anything suspicious in the YARN ResourceManager logs? -Sandy On Fri, Aug 7, 2015 at 1:48 AM, Jem Tucker jem.tuc...@gmail.com wrote: Hi, I am running spark on YARN on the CDH5.3.2 stack. I have created a new user to own and run a testing environment, however when using this user applications I submit to yarn never begin to run, even if they are the exact same application that is successful with another user? Has anyone seen anything like this before? Thanks, Jem
Re: Spark on YARN
Hi Sandy, The application doesn't fail, it gets accepted by yarn but the application master never starts and the application state never changes to running. I have checked in the resource manager and node manager logs and nothing jumps out. Thanks Jem On Sat, 8 Aug 2015 at 09:20, Sandy Ryza sandy.r...@cloudera.com wrote: Hi Jem, Do they fail with any particular exception? Does YARN just never end up giving them resources? Does an application master start? If so, what are in its logs? If not, anything suspicious in the YARN ResourceManager logs? -Sandy On Fri, Aug 7, 2015 at 1:48 AM, Jem Tucker jem.tuc...@gmail.com wrote: Hi, I am running spark on YARN on the CDH5.3.2 stack. I have created a new user to own and run a testing environment, however when using this user applications I submit to yarn never begin to run, even if they are the exact same application that is successful with another user? Has anyone seen anything like this before? Thanks, Jem
Pagination on big table, splitting joins
Hi, I have two different parts in my system. 1. Batch application that every x minutes do sql queries between several tables that contains millions of rows to compound a entity, and sent that entities to Kafka. 2. Streaming application that processing data from Kafka. Now, I have entire system working, but I want to improve the performance in the batch part, because if I have 100 millions of entities I send them to Kafka in a foreach method in a row, which makes no sense for the next streaming application. I want, send each 10 millions events to Kafka, for example. I have a query, imagine *select ... from table 1 left outer join table 2 on ... left outer join table 3 on ... left outer join table 4 on ...* My target is do *pagination* on table 1 and take 10 million in a separate RDD, do the joins and send to Kafka, then take another 10 million and do the same... I have all tables in parquet format in hdfs. I think to use *toLocalIterator* method and something like that, but I have doubts about memory and parallelism and sure there is a better way to do it. rdd.toLocalIterator.grouped(1000).foreach( seq = val rdd: RDD[(String, Int)] = sc.parallelize(seq) // Do the processing ) What do you think? Regards. -- Gaspar Muñoz @gmunozsoria http://www.stratio.com/ Vía de las dos Castillas, 33, Ática 4, 3ª Planta 28224 Pozuelo de Alarcón, Madrid Tel: +34 91 352 59 42 // *@stratiobd https://twitter.com/StratioBD*
Re: java.lang.ClassNotFoundException
Have you tried including package name in the class name ? Thanks On Aug 8, 2015, at 12:00 AM, Yasemin Kaya godo...@gmail.com wrote: Hi, I have a little spark program and i am getting an error why i dont understand. My code is https://gist.github.com/yaseminn/522a75b863ad78934bc3. I am using spark 1.3 Submitting : bin/spark-submit --class MonthlyAverage --master local[4] weather.jar error: ~/spark-1.3.1-bin-hadoop2.4$ bin/spark-submit --class MonthlyAverage --master local[4] weather.jar java.lang.ClassNotFoundException: MonthlyAverage at java.net.URLClassLoader$1.run(URLClassLoader.java:366) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:425) at java.lang.ClassLoader.loadClass(ClassLoader.java:358) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:274) at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:538) at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:166) at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:189) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:110) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties Please help me Asap.. yasemin -- hiç ender hiç
Re: Spark on YARN
Hi dustin, Yes there are enough resources available, the same application run with a different user works fine so I think it is something to do with permissions but I can't work out where. Thanks , Jem On Sat, 8 Aug 2015 at 17:35, Dustin Cote dc...@cloudera.com wrote: Hi Jem, In the top of the RM web UI, do you see any available resources to spawn the application master container? On Sat, Aug 8, 2015 at 4:37 AM, Jem Tucker jem.tuc...@gmail.com wrote: Hi Sandy, The application doesn't fail, it gets accepted by yarn but the application master never starts and the application state never changes to running. I have checked in the resource manager and node manager logs and nothing jumps out. Thanks Jem On Sat, 8 Aug 2015 at 09:20, Sandy Ryza sandy.r...@cloudera.com wrote: Hi Jem, Do they fail with any particular exception? Does YARN just never end up giving them resources? Does an application master start? If so, what are in its logs? If not, anything suspicious in the YARN ResourceManager logs? -Sandy On Fri, Aug 7, 2015 at 1:48 AM, Jem Tucker jem.tuc...@gmail.com wrote: Hi, I am running spark on YARN on the CDH5.3.2 stack. I have created a new user to own and run a testing environment, however when using this user applications I submit to yarn never begin to run, even if they are the exact same application that is successful with another user? Has anyone seen anything like this before? Thanks, Jem -- --- You received this message because you are subscribed to the Google Groups CDH Users group. To unsubscribe from this group and stop receiving emails from it, send an email to cdh-user+unsubscr...@cloudera.org. For more options, visit https://groups.google.com/a/cloudera.org/d/optout . -- Dustin Cote Customer Operations Engineer http://www.cloudera.com -- --- You received this message because you are subscribed to the Google Groups CDH Users group. To unsubscribe from this group and stop receiving emails from it, send an email to cdh-user+unsubscr...@cloudera.org. For more options, visit https://groups.google.com/a/cloudera.org/d/optout.
Re: java.lang.ClassNotFoundException
Thanx Ted, i solved it :) 2015-08-08 14:07 GMT+03:00 Ted Yu yuzhih...@gmail.com: Have you tried including package name in the class name ? Thanks On Aug 8, 2015, at 12:00 AM, Yasemin Kaya godo...@gmail.com wrote: Hi, I have a little spark program and i am getting an error why i dont understand. My code is https://gist.github.com/yaseminn/522a75b863ad78934bc3. I am using spark 1.3 Submitting : bin/spark-submit --class MonthlyAverage --master local[4] weather.jar error: ~/spark-1.3.1-bin-hadoop2.4$ bin/spark-submit --class MonthlyAverage --master local[4] weather.jar java.lang.ClassNotFoundException: MonthlyAverage at java.net.URLClassLoader$1.run(URLClassLoader.java:366) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:425) at java.lang.ClassLoader.loadClass(ClassLoader.java:358) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:274) at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:538) at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:166) at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:189) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:110) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties Please help me Asap.. yasemin -- hiç ender hiç -- hiç ender hiç
Re: Spark on YARN
which is the scheduler on your cluster. Just check on RM UI scheduler tab and see your user and max limit of vcores for that user , is currently other applications of that user have occupies till max vcores of this user then that could be the reason of not allocating vcores to this user but for some other user same applicatin is getting run since another user's max vcore limit is not reached. On Sat, Aug 8, 2015 at 10:07 PM, Jem Tucker jem.tuc...@gmail.com wrote: Hi dustin, Yes there are enough resources available, the same application run with a different user works fine so I think it is something to do with permissions but I can't work out where. Thanks , Jem On Sat, 8 Aug 2015 at 17:35, Dustin Cote dc...@cloudera.com wrote: Hi Jem, In the top of the RM web UI, do you see any available resources to spawn the application master container? On Sat, Aug 8, 2015 at 4:37 AM, Jem Tucker jem.tuc...@gmail.com wrote: Hi Sandy, The application doesn't fail, it gets accepted by yarn but the application master never starts and the application state never changes to running. I have checked in the resource manager and node manager logs and nothing jumps out. Thanks Jem On Sat, 8 Aug 2015 at 09:20, Sandy Ryza sandy.r...@cloudera.com wrote: Hi Jem, Do they fail with any particular exception? Does YARN just never end up giving them resources? Does an application master start? If so, what are in its logs? If not, anything suspicious in the YARN ResourceManager logs? -Sandy On Fri, Aug 7, 2015 at 1:48 AM, Jem Tucker jem.tuc...@gmail.com wrote: Hi, I am running spark on YARN on the CDH5.3.2 stack. I have created a new user to own and run a testing environment, however when using this user applications I submit to yarn never begin to run, even if they are the exact same application that is successful with another user? Has anyone seen anything like this before? Thanks, Jem -- --- You received this message because you are subscribed to the Google Groups CDH Users group. To unsubscribe from this group and stop receiving emails from it, send an email to cdh-user+unsubscr...@cloudera.org. For more options, visit https://groups.google.com/a/cloudera.org/d/optout. -- Dustin Cote Customer Operations Engineer http://www.cloudera.com -- --- You received this message because you are subscribed to the Google Groups CDH Users group. To unsubscribe from this group and stop receiving emails from it, send an email to cdh-user+unsubscr...@cloudera.org. For more options, visit https://groups.google.com/a/cloudera.org/d/optout . -- --- You received this message because you are subscribed to the Google Groups CDH Users group. To unsubscribe from this group and stop receiving emails from it, send an email to cdh-user+unsubscr...@cloudera.org. For more options, visit https://groups.google.com/a/cloudera.org/d/optout.
Re: DataFrame column structure change
You can use struct function of org.apache.spark.sql.function class to combine two columns to create struct column. Sth like. val nestedCol = struct(df(d), df(e)) df.select(df(a), df(b), df(c), nestedCol) On Aug 7, 2015 3:14 PM, Rishabh Bhardwaj rbnex...@gmail.com wrote: I am doing it by creating a new data frame out of the fields to be nested and then join with the original DF. Looking for some optimized solution here. On Fri, Aug 7, 2015 at 2:06 PM, Rishabh Bhardwaj rbnex...@gmail.com wrote: Hi all, I want to have some nesting structure from the existing columns of the dataframe. For that,,I am trying to transform a DF in the following way,but couldn't do it. scala df.printSchema root |-- a: string (nullable = true) |-- b: string (nullable = true) |-- c: string (nullable = true) |-- d: string (nullable = true) |-- e: string (nullable = true) |-- f: string (nullable = true) *To* scala newDF.printSchema root |-- a: string (nullable = true) |-- b: string (nullable = true) |-- c: string (nullable = true) |-- newCol: struct (nullable = true) ||-- d: string (nullable = true) ||-- e: string (nullable = true) help me. Regards, Rishabh.
Spark sql jobs n their partition
I have a complex transformation requirements that i m implementing using dataframe. It involves lot of joins also with Cassandra table. I was wondering how can I debug the jobs n stages queued by spark sql the way I can do for Rdds. In one of cases, spark sql creates more than 17 lakhs tasks for 2gb data.. I have set sql partition@32. Raghav
How to create DataFrame from a binary file?
Hi how do we create DataFrame from a binary file stored in HDFS? I was thinking to use JavaPairRDDString,PortableDataStream pairRdd = javaSparkContext.binaryFiles(/hdfs/path/to/binfile); JavaRDDPortableDataStream javardd = pairRdd.values(); I can see that PortableDataStream has method called toArray which can convert into byte array I was thinking if I have JavaRDDbyte[] can I call the following and get DataFrame DataFrame binDataFrame = sqlContext.createDataFrame(javaBinRdd,Byte.class); Please guide I am new to Spark. I have my own custom format which is binary format and I was thinking if I can convert my custom format into DataFrame using binary operations then I dont need to create my own custom Hadoop format am I on right track? Will reading binary data into DataFrame scale? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/How-to-create-DataFrame-from-a-binary-file-tp24179.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Spark master driver UI: How to keep it after process finished?
Hi Saif, You need to run your application with `spark.eventLog.enabled` set to true. Then if you are using standalone mode, you can view the Master UI at port 8080. Otherwise, you may start a history server through `sbin/start-history-server.sh`, which by default starts the history UI at port 18080. For more information on how to set this up, visit: http://spark.apache.org/docs/latest/monitoring.html -Andrew 2015-08-07 13:16 GMT-07:00 François Pelletier newslett...@francoispelletier.org: look at spark.history.ui.port, if you use standalone spark.yarn.historyServer.address, if you use YARN in your Spark config file Mine is located at /etc/spark/conf/spark-defaults.conf If you use Apache Ambari you can find this settings in the Spark / Configs / Advanced spark-defaults tab François Le 2015-08-07 15:58, saif.a.ell...@wellsfargo.com a écrit : Hello, thank you, but that port is unreachable for me. Can you please share where can I find that port equivalent in my environment? Thank you Saif *From:* François Pelletier [mailto:newslett...@francoispelletier.org newslett...@francoispelletier.org] *Sent:* Friday, August 07, 2015 4:38 PM *To:* user@spark.apache.org *Subject:* Re: Spark master driver UI: How to keep it after process finished? Hi, all spark processes are saved in the Spark History Server look at your host on port 18080 instead of 4040 François Le 2015-08-07 15:26, saif.a.ell...@wellsfargo.com a écrit : Hi, A silly question here. The Driver Web UI dies when the spark-submit program finish. I would like some time to analyze after the program ends, as the page does not refresh it self, when I hit F5 I lose all the info. Thanks, Saif
Re: Schema change on Spark Hive (Parquet file format) table not working
Yes, I've found a number of problems with metadata management in Spark SQL. One core issue is SPARK-9764 https://issues.apache.org/jira/browse/SPARK-9764 . Related issues are SPARK-9342 https://issues.apache.org/jira/browse/SPARK-9342 , SPARK-9761 https://issues.apache.org/jira/browse/SPARK-9761 and SPARK-9762 https://issues.apache.org/jira/browse/SPARK-9762 . I've also observed a case where, after an exception in ALTER TABLE, Spark SQL thought a table had 0 rows while, in fact, all the data was still there. I was not able to reproduce this one reliably so I did not create a JIRA issue for it. Let's vote for these issues and get them resolved. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Schema-change-on-Spark-Hive-Parquet-file-format-table-not-working-tp15360p24180.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Spark inserting into parquet files with different schema
Adam, did you find a solution for this? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-inserting-into-parquet-files-with-different-schema-tp20706p24181.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org