Re: How to pass parameters to a spark-jobserver Scala class?
Hi Sasi, I am not sure about Vaadin, but by simple googling you can find many article on how to pass json parameters in http. http://stackoverflow.com/questions/21404252/post-request-send-json-data-java-httpurlconnection You can also try Finagle which is fully fault tolerant framework by Twitter. Regards, Vasu C -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/How-to-pass-parameters-to-a-spark-jobserver-Scala-class-tp21671p21727.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: How to pass parameters to a spark-jobserver Scala class?
Hi Sasi, Forgot to mention job server uses Typesafe Config library. The input is JSON, you can find syntax in below link https://github.com/typesafehub/config Regards, Vasu C -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/How-to-pass-parameters-to-a-spark-jobserver-Scala-class-tp21671p21695.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
JsonRDD to parquet -- data loss
Hi, I am running spark batch processing job using spark-submit command. And below is my code snippet. Basically converting JsonRDD to parquet and storing it in HDFS location. The problem I am facing is if multiple jobs are are triggered parallely, even though job executes properly (as i can see in spark webUI), there is no parquet file created in hdfs path. If 5 jobs are executed parallely than only 3 parquet files are getting created. Is this the data loss scenario ? Or am I missing something here. Please help me in this Here tableName is unique with timestamp appended to it. val sqlContext = new org.apache.spark.sql.SQLContext(sc) val jsonRdd = sqlContext.jsonRDD(results) val parquetTable = sqlContext.parquetFile(parquetFilePath) parquetTable.registerTempTable(tableName) jsonRdd.insertInto(tableName) Regards, Vasu C
Re: How to pass parameters to a spark-jobserver Scala class?
Hi Sasi, To pass parameters to spark-jobserver usecurl -d input.string = a b c a b see and in Job server class use config.getString(input.string). You can pass multiple parameters like starttime,endtime etc and use config.getString() to get the values. The examples are shown here https://github.com/spark-jobserver/spark-jobserver Regards, Vasu C -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/How-to-pass-parameters-to-a-spark-jobserver-Scala-class-tp21671p21692.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Spark/HIVE Insert Into values Error
Hi Arthur, May I know what is the solution., I have similar requirements. Regards, Vasu C On Sun, Oct 26, 2014 at 12:09 PM, arthur.hk.c...@gmail.com arthur.hk.c...@gmail.com wrote: Hi, I have already found the way about how to “insert into HIVE_TABLE values (…..) Regards Arthur On 18 Oct, 2014, at 10:09 pm, Cheng Lian lian.cs@gmail.com wrote: Currently Spark SQL uses Hive 0.12.0, which doesn't support the INSERT INTO ... VALUES ... syntax. On 10/18/14 1:33 AM, arthur.hk.c...@gmail.com wrote: Hi, When trying to insert records into HIVE, I got error, My Spark is 1.1.0 and Hive 0.12.0 Any idea what would be wrong? Regards Arthur hive CREATE TABLE students (name VARCHAR(64), age INT, gpa int); OK hive INSERT INTO TABLE students VALUES ('fred flintstone', 35, 1); NoViableAltException(26@[]) at org.apache.hadoop.hive.ql.parse.HiveParser_SelectClauseParser.selectClause(HiveParser_SelectClauseParser.java:693) at org.apache.hadoop.hive.ql.parse.HiveParser.selectClause(HiveParser.java:31374) at org.apache.hadoop.hive.ql.parse.HiveParser.regular_body(HiveParser.java:29083) at org.apache.hadoop.hive.ql.parse.HiveParser.queryStatement(HiveParser.java:28968) at org.apache.hadoop.hive.ql.parse.HiveParser.queryStatementExpression(HiveParser.java:28762) at org.apache.hadoop.hive.ql.parse.HiveParser.execStatement(HiveParser.java:1238) at org.apache.hadoop.hive.ql.parse.HiveParser.statement(HiveParser.java:938) at org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:190) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:424) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:342) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:977) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:888) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:259) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:216) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:413) at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:781) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:675) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:614) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.RunJar.main(RunJar.java:212) FAILED: ParseException line 1:27 cannot recognize input near 'VALUES' '(' ''fred flintstone'' in select clause
Re: JavaStreamingContextFactory checkpoint directory NotSerializableException
Thanks for pointing to the issue. Yes I think its the same issue, below is Exception ERROR OneForOneStrategy: TestCheckpointStreamingJson$1 java.io.NotSerializableException: TestCheckpointStreamingJson at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1183) at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547) at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508) at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431) at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177) at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547) at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508) at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431) at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177) at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547) at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508) at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431) at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177) at java.io.ObjectOutputStream.writeArray(ObjectOutputStream.java:1377) at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1173) at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547) at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508) at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431) at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177) at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547) at java.io.ObjectOutputStream.defaultWriteObject(ObjectOutputStream.java:440) at org.apache.spark.streaming.DStreamGraph.writeObject(DStreamGraph.scala:168) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at java.io.ObjectStreamClass.invokeWriteObject(ObjectStreamClass.java:988) at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1495) at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431) at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177) at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547) at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508) at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431) at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177) at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:347) at org.apache.spark.streaming.CheckpointWriter.write(Checkpoint.scala:184) at org.apache.spark.streaming.scheduler.JobGenerator.doCheckpoint(JobGenerator.scala:259) at org.apache.spark.streaming.scheduler.JobGenerator.org $apache$spark$streaming$scheduler$JobGenerator$$processEvent(JobGenerator.scala:167) at org.apache.spark.streaming.scheduler.JobGenerator$$anonfun$start$1$$anon$1$$anonfun$receive$1.applyOrElse(JobGenerator.scala:76) at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498) at akka.actor.ActorCell.invoke(ActorCell.scala:456) at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237) at akka.dispatch.Mailbox.run(Mailbox.scala:219) at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386) at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) Regards, Vasu C On Thu, Nov 6, 2014 at 1:14 PM, Sean Owen so...@cloudera.com wrote: You didn't say what isn't serializable or where the exception occurs, but, is it the same as this issue? https://issues.apache.org/jira/browse/SPARK-4196 On Thu, Nov 6, 2014 at 5:42 AM, Vasu C vasuc.bigd...@gmail.com wrote: Dear All, I am getting java.io.NotSerializableException for below code. if jssc.checkpoint(HDFS_CHECKPOINT_DIR); is blocked there is not exception Please help JavaStreamingContextFactory contextFactory = new JavaStreamingContextFactory() { @Override public JavaStreamingContext create() { SparkConf sparkConf = new SparkConf().set(spark.cores.max, 3); final JavaStreamingContext jssc = new JavaStreamingContext( sparkConf, new Duration(300)); final JavaHiveContext javahiveContext = new JavaHiveContext( jssc.sc()); javahiveContext.createParquetFile(Bean.class, IMPALA_TABLE_LOC, true, new Configuration()) .registerTempTable(TEMP_TABLE_NAME); // TODO create checkpoint directory for fault tolerance
Re: JavaStreamingContextFactory checkpoint directory NotSerializableException
HI Sean, Below is my java code and using spark 1.1.0. Still getting the same error. Here Bean class is serialized. Not sure where exactly is the problem. What am I doing wrong here ? public class StreamingJson { public static void main(String[] args) throws Exception { final String HDFS_FILE_LOC = args[0]; final String IMPALA_TABLE_LOC = args[1]; final String TEMP_TABLE_NAME = args[2]; final String HDFS_CHECKPOINT_DIR = args[3]; JavaStreamingContextFactory contextFactory = new JavaStreamingContextFactory() { public JavaStreamingContext create() { SparkConf sparkConf = new SparkConf().setAppName( test).set(spark.cores.max, 3); final JavaStreamingContext jssc = new JavaStreamingContext( sparkConf, new Duration(500)); final JavaHiveContext javahiveContext = new JavaHiveContext( jssc.sc()); javahiveContext.createParquetFile(Bean.class, IMPALA_TABLE_LOC, true, new Configuration()) .registerTempTable(TEMP_TABLE_NAME); final JavaDStreamString textFileStream = jssc .textFileStream(HDFS_FILE_LOC); textFileStream .foreachRDD(new Function2JavaRDDString, Time, Void() { @Override public Void call(JavaRDDString rdd, Time time) throws Exception { if (rdd != null) { if (rdd.count() 0) { JavaSchemaRDD schRdd = javahiveContext .jsonRDD(rdd); schRdd.insertInto(TEMP_TABLE_NAME); } } return null; } }); jssc.checkpoint(HDFS_CHECKPOINT_DIR); return jssc; } }; JavaStreamingContext context = JavaStreamingContext.getOrCreate( HDFS_CHECKPOINT_DIR, contextFactory); context.start(); // Start the computation context.awaitTermination(); } } Regards, Vasu C On Thu, Nov 6, 2014 at 1:33 PM, Sean Owen so...@cloudera.com wrote: No, not the same thing then. This just means you accidentally have a reference to the unserializable enclosing test class in your code. Just make sure the reference is severed. On Thu, Nov 6, 2014 at 8:00 AM, Vasu C vasuc.bigd...@gmail.com wrote: Thanks for pointing to the issue. Yes I think its the same issue, below is Exception ERROR OneForOneStrategy: TestCheckpointStreamingJson$1 java.io.NotSerializableException: TestCheckpointStreamingJson
JavaStreamingContextFactory checkpoint directory NotSerializableException
Dear All, I am getting java.io.NotSerializableException for below code. if jssc.checkpoint(HDFS_CHECKPOINT_DIR); is blocked there is not exception Please help JavaStreamingContextFactory contextFactory = new JavaStreamingContextFactory() { @Override public JavaStreamingContext create() { SparkConf sparkConf = new SparkConf().set(spark.cores.max, 3); final JavaStreamingContext jssc = new JavaStreamingContext( sparkConf, new Duration(300)); final JavaHiveContext javahiveContext = new JavaHiveContext( jssc.sc()); javahiveContext.createParquetFile(Bean.class, IMPALA_TABLE_LOC, true, new Configuration()) .registerTempTable(TEMP_TABLE_NAME); // TODO create checkpoint directory for fault tolerance final JavaDStreamString textFileStream = jssc .textFileStream(HDFS_FILE_LOC); textFileStream .foreachRDD(new Function2JavaRDDString, Time, Void() { @Override public Void call(JavaRDDString rdd, Time time) throws Exception { if (rdd != null) { if (rdd.count() 0) { JavaSchemaRDD schRdd = javahiveContext .jsonRDD(rdd); schRdd.insertInto(TEMP_TABLE_NAME); } } return null; } }); jssc.checkpoint(HDFS_CHECKPOINT_DIR); return jssc; } }; // Get JavaStreamingContext from checkpoint data or create a new one JavaStreamingContext context = JavaStreamingContext.getOrCreate( HDFS_CHECKPOINT_DIR, contextFactory); context.start(); // Start the computation context.awaitTermination(); Regards, Vasu