Re: How to pass parameters to a spark-jobserver Scala class?

2015-02-19 Thread Vasu C
Hi Sasi,

I am not sure about Vaadin, but by simple googling  you can find many
article on how to pass json parameters in http. 

http://stackoverflow.com/questions/21404252/post-request-send-json-data-java-httpurlconnection

You can also try Finagle which is fully fault tolerant framework by Twitter.


Regards,
   Vasu C



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/How-to-pass-parameters-to-a-spark-jobserver-Scala-class-tp21671p21727.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: How to pass parameters to a spark-jobserver Scala class?

2015-02-18 Thread Vasu C
Hi Sasi,

Forgot to mention job server uses Typesafe Config library.  The input is
JSON, you can find syntax in below link

https://github.com/typesafehub/config



Regards,
   Vasu C



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/How-to-pass-parameters-to-a-spark-jobserver-Scala-class-tp21671p21695.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



JsonRDD to parquet -- data loss

2015-02-17 Thread Vasu C
Hi,

I am running spark batch processing job using spark-submit command. And
below is my code snippet.  Basically converting JsonRDD to parquet and
storing it in HDFS location.

The problem I am facing is if multiple jobs are are triggered parallely,
even though job executes properly (as i can see in spark webUI), there is
no parquet file created in hdfs path. If 5 jobs are executed parallely than
only 3 parquet files are getting created.

Is this the data loss scenario ? Or am I missing something here. Please
help me in this

Here tableName is unique with timestamp appended to it.


val sqlContext = new org.apache.spark.sql.SQLContext(sc)

val jsonRdd  = sqlContext.jsonRDD(results)

val parquetTable = sqlContext.parquetFile(parquetFilePath)

parquetTable.registerTempTable(tableName)

jsonRdd.insertInto(tableName)


Regards,

  Vasu C


Re: How to pass parameters to a spark-jobserver Scala class?

2015-02-17 Thread Vasu C
Hi Sasi,

To pass parameters to spark-jobserver usecurl -d input.string = a b c
a b see  and in Job server class use config.getString(input.string). 

You can pass multiple parameters like starttime,endtime etc and use
config.getString() to get the values.

The examples are shown here
https://github.com/spark-jobserver/spark-jobserver


Regards,
   Vasu C



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/How-to-pass-parameters-to-a-spark-jobserver-Scala-class-tp21671p21692.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Spark/HIVE Insert Into values Error

2014-11-13 Thread Vasu C
Hi Arthur,

May I know what is the solution., I have similar requirements.


Regards,
  Vasu C

On Sun, Oct 26, 2014 at 12:09 PM, arthur.hk.c...@gmail.com 
arthur.hk.c...@gmail.com wrote:

 Hi,

 I have already found the way about how to “insert into HIVE_TABLE values
 (…..)

 Regards
 Arthur

 On 18 Oct, 2014, at 10:09 pm, Cheng Lian lian.cs@gmail.com wrote:

  Currently Spark SQL uses Hive 0.12.0, which doesn't support the INSERT
 INTO ... VALUES ... syntax.

 On 10/18/14 1:33 AM, arthur.hk.c...@gmail.com wrote:

 Hi,

  When trying to insert records into HIVE, I got error,

  My Spark is 1.1.0 and Hive 0.12.0

  Any idea what would be wrong?
 Regards
 Arthur



  hive CREATE TABLE students (name VARCHAR(64), age INT, gpa int);

 OK

  hive INSERT INTO TABLE students VALUES ('fred flintstone', 35, 1);
 NoViableAltException(26@[])
  at
 org.apache.hadoop.hive.ql.parse.HiveParser_SelectClauseParser.selectClause(HiveParser_SelectClauseParser.java:693)
  at
 org.apache.hadoop.hive.ql.parse.HiveParser.selectClause(HiveParser.java:31374)
  at
 org.apache.hadoop.hive.ql.parse.HiveParser.regular_body(HiveParser.java:29083)
  at
 org.apache.hadoop.hive.ql.parse.HiveParser.queryStatement(HiveParser.java:28968)
  at
 org.apache.hadoop.hive.ql.parse.HiveParser.queryStatementExpression(HiveParser.java:28762)
  at
 org.apache.hadoop.hive.ql.parse.HiveParser.execStatement(HiveParser.java:1238)
  at
 org.apache.hadoop.hive.ql.parse.HiveParser.statement(HiveParser.java:938)
  at
 org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:190)
  at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:424)
  at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:342)
  at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:977)
  at org.apache.hadoop.hive.ql.Driver.run(Driver.java:888)
  at
 org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:259)
  at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:216)
  at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:413)
  at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:781)
  at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:675)
  at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:614)
  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
  at
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
  at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  at java.lang.reflect.Method.invoke(Method.java:606)
  at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
 FAILED: ParseException line 1:27 cannot recognize input near 'VALUES' '('
 ''fred flintstone'' in select clause







Re: JavaStreamingContextFactory checkpoint directory NotSerializableException

2014-11-06 Thread Vasu C
Thanks for pointing to the issue.

Yes I think its the same issue, below is Exception


ERROR OneForOneStrategy: TestCheckpointStreamingJson$1
java.io.NotSerializableException: TestCheckpointStreamingJson
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1183)
at
java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547)
at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508)
at
java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
at
java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547)
at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508)
at
java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
at
java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547)
at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508)
at
java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
at java.io.ObjectOutputStream.writeArray(ObjectOutputStream.java:1377)
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1173)
at
java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547)
at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508)
at
java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
at
java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547)
at
java.io.ObjectOutputStream.defaultWriteObject(ObjectOutputStream.java:440)
at
org.apache.spark.streaming.DStreamGraph.writeObject(DStreamGraph.scala:168)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at java.io.ObjectStreamClass.invokeWriteObject(ObjectStreamClass.java:988)
at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1495)
at
java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
at
java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547)
at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508)
at
java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:347)
at org.apache.spark.streaming.CheckpointWriter.write(Checkpoint.scala:184)
at
org.apache.spark.streaming.scheduler.JobGenerator.doCheckpoint(JobGenerator.scala:259)
at org.apache.spark.streaming.scheduler.JobGenerator.org
$apache$spark$streaming$scheduler$JobGenerator$$processEvent(JobGenerator.scala:167)
at
org.apache.spark.streaming.scheduler.JobGenerator$$anonfun$start$1$$anon$1$$anonfun$receive$1.applyOrElse(JobGenerator.scala:76)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
at akka.actor.ActorCell.invoke(ActorCell.scala:456)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
at akka.dispatch.Mailbox.run(Mailbox.scala:219)
at
akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)


Regards,
  Vasu C

On Thu, Nov 6, 2014 at 1:14 PM, Sean Owen so...@cloudera.com wrote:

 You didn't say what isn't serializable or where the exception occurs,
 but, is it the same as this issue?
 https://issues.apache.org/jira/browse/SPARK-4196

 On Thu, Nov 6, 2014 at 5:42 AM, Vasu C vasuc.bigd...@gmail.com wrote:
  Dear All,
 
  I am getting java.io.NotSerializableException  for below code. if
  jssc.checkpoint(HDFS_CHECKPOINT_DIR); is blocked there is not exception
   Please help
 
  JavaStreamingContextFactory contextFactory = new
  JavaStreamingContextFactory() {
  @Override
  public JavaStreamingContext create() {
  SparkConf sparkConf = new SparkConf().set(spark.cores.max, 3);
 
  final JavaStreamingContext jssc = new JavaStreamingContext(
  sparkConf, new Duration(300));
 
  final JavaHiveContext javahiveContext = new JavaHiveContext(
  jssc.sc());
 
  javahiveContext.createParquetFile(Bean.class,
  IMPALA_TABLE_LOC, true, new Configuration())
  .registerTempTable(TEMP_TABLE_NAME);
 
  // TODO create checkpoint directory for fault tolerance

Re: JavaStreamingContextFactory checkpoint directory NotSerializableException

2014-11-06 Thread Vasu C
HI Sean,

Below is my java code and using spark 1.1.0. Still getting the same error.
Here Bean class is serialized. Not sure where exactly is the problem.
What am I doing wrong here ?

public class StreamingJson {
public static void main(String[] args) throws Exception {
final String HDFS_FILE_LOC = args[0];
final String IMPALA_TABLE_LOC = args[1];
final String TEMP_TABLE_NAME = args[2];
final String HDFS_CHECKPOINT_DIR = args[3];

JavaStreamingContextFactory contextFactory = new
JavaStreamingContextFactory() {
public JavaStreamingContext create() {
SparkConf sparkConf = new SparkConf().setAppName(
test).set(spark.cores.max, 3);

final JavaStreamingContext jssc = new JavaStreamingContext(
sparkConf, new Duration(500));

final JavaHiveContext javahiveContext = new JavaHiveContext(
jssc.sc());

javahiveContext.createParquetFile(Bean.class,
IMPALA_TABLE_LOC, true, new Configuration())
.registerTempTable(TEMP_TABLE_NAME);

final JavaDStreamString textFileStream = jssc
.textFileStream(HDFS_FILE_LOC);

textFileStream
.foreachRDD(new Function2JavaRDDString, Time, Void() {

@Override
public Void call(JavaRDDString rdd, Time time)
throws Exception {
if (rdd != null) {
if (rdd.count()  0) {
JavaSchemaRDD schRdd = javahiveContext
.jsonRDD(rdd);
schRdd.insertInto(TEMP_TABLE_NAME);
}
}
return null;
}
});
jssc.checkpoint(HDFS_CHECKPOINT_DIR);
return jssc;
}
};
JavaStreamingContext context = JavaStreamingContext.getOrCreate(
HDFS_CHECKPOINT_DIR, contextFactory);
context.start(); // Start the computation
context.awaitTermination();
}
}



Regards,
   Vasu C

On Thu, Nov 6, 2014 at 1:33 PM, Sean Owen so...@cloudera.com wrote:

 No, not the same thing then. This just means you accidentally have a
 reference to the unserializable enclosing test class in your code.
 Just make sure the reference is severed.

 On Thu, Nov 6, 2014 at 8:00 AM, Vasu C vasuc.bigd...@gmail.com wrote:
  Thanks for pointing to the issue.
 
  Yes I think its the same issue, below is Exception
 
 
  ERROR OneForOneStrategy: TestCheckpointStreamingJson$1
  java.io.NotSerializableException: TestCheckpointStreamingJson



JavaStreamingContextFactory checkpoint directory NotSerializableException

2014-11-05 Thread Vasu C
Dear All,

I am getting java.io.NotSerializableException  for below code. if
jssc.checkpoint(HDFS_CHECKPOINT_DIR); is blocked there is not exception
 Please help

JavaStreamingContextFactory contextFactory = new
JavaStreamingContextFactory() {
@Override
public JavaStreamingContext create() {
SparkConf sparkConf = new SparkConf().set(spark.cores.max, 3);

final JavaStreamingContext jssc = new JavaStreamingContext(
sparkConf, new Duration(300));

final JavaHiveContext javahiveContext = new JavaHiveContext(
jssc.sc());

javahiveContext.createParquetFile(Bean.class,
IMPALA_TABLE_LOC, true, new Configuration())
.registerTempTable(TEMP_TABLE_NAME);

// TODO create checkpoint directory for fault tolerance
final JavaDStreamString textFileStream = jssc
.textFileStream(HDFS_FILE_LOC);

textFileStream
.foreachRDD(new Function2JavaRDDString, Time, Void() {

@Override
public Void call(JavaRDDString rdd, Time time)
throws Exception {
if (rdd != null) {
if (rdd.count()  0) {
JavaSchemaRDD schRdd = javahiveContext
.jsonRDD(rdd);
schRdd.insertInto(TEMP_TABLE_NAME);
}
}
return null;
}
});
jssc.checkpoint(HDFS_CHECKPOINT_DIR);
return jssc;
}
};

// Get JavaStreamingContext from checkpoint data or create a new one
JavaStreamingContext context = JavaStreamingContext.getOrCreate(
HDFS_CHECKPOINT_DIR, contextFactory);

context.start(); // Start the computation
context.awaitTermination();



Regards,
   Vasu