Streaming JSON string from REST Api in Spring

2014-03-06 Thread sonyjv
Hi,

I am very new to Spark and currently trying to implement a use case. We have
a JSON based REST Api implemented in Spring which gets around 50 calls/sec.
I would like to stream these JSON strings to Spark for processing and
aggregation. We are having strict SLA and would like to know the best way to
design the interface between the REST Api and Spark. 

Also, the processing part has different steps and is think of having
multiple Spark jobs for performing these steps. What is the best way of
triggering one job from another and passing data between these jobs. 

Thanks,
Sony
  



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Streaming-JSON-string-from-REST-Api-in-Spring-tp2358.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.


Re: Streaming JSON string from REST Api in Spring

2014-03-06 Thread sonyjv
Thanks Mayur for your response.

I think I need to clarify the first part of my query. The JSON based REST
API will be called by external interfaces. These requests needs to be
processed in a streaming mode in Spark. I am not clear about the following
points

1. How can JSON request string (50 per sec) be continuously streamed to
Spark. 
2. The processing of the request in Spark will not last long. But would
require to be split into multiple steps to render fast initial response. So
for coordinating the Spark jobs do I have to use Kafka or any other queues.
Or can I directly stream from one job to another. 

Regards,
Sony



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Streaming-JSON-string-from-REST-Api-in-Spring-tp2358p2383.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.


Re: Streaming JSON string from REST Api in Spring

2014-03-10 Thread sonyjv
Thanks Mayur for your clarification.



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Streaming-JSON-string-from-REST-Api-in-Spring-tp2358p2451.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.


Cassandra CQL read/write from spark using Java - [Remoting] Remoting error: [Startup timed out]

2014-03-20 Thread sonyjv
Hi,

In standalone mode I am trying to perform some Cassandra CQL read/write
operations. Following is my Maven dependencies.

 
org.apache.spark
spark-core_2.10
0.9.0-incubating


org.apache.hadoop
hadoop-client
2.0.0-cdh4.5.0


org.apache.cassandra
cassandra-all
2.0.6


While initiating JavaSparkContext I am getting the following error. 

Exception in thread "main" java.util.concurrent.TimeoutException: Futures
timed out after [1 milliseconds]
at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219)
at 
scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)
at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107)
at
scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53)
at scala.concurrent.Await$.result(package.scala:107)
at akka.remote.Remoting.start(Remoting.scala:173)
at
akka.remote.RemoteActorRefProvider.init(RemoteActorRefProvider.scala:184)
at akka.actor.ActorSystemImpl._start$lzycompute(ActorSystem.scala:579)
at akka.actor.ActorSystemImpl._start(ActorSystem.scala:577)
at akka.actor.ActorSystemImpl.start(ActorSystem.scala:588)
at akka.actor.ActorSystem$.apply(ActorSystem.scala:111)
at akka.actor.ActorSystem$.apply(ActorSystem.scala:104)
at 
org.apache.spark.util.AkkaUtils$.createActorSystem(AkkaUtils.scala:96)
at org.apache.spark.SparkEnv$.create(SparkEnv.scala:126)
at org.apache.spark.SparkContext.(SparkContext.scala:139)
at
org.apache.spark.api.java.JavaSparkContext.(JavaSparkContext.scala:47)
at com.sample.spark.NewCassandraTest.main(NewCassandraTest.java:66)
[ERROR] [03/20/2014 14:34:50.429] [main] [Remoting] Remoting error: [Startup
timed out] [
akka.remote.RemoteTransportException: Startup timed out
at
akka.remote.Remoting.akka$remote$Remoting$$notifyError(Remoting.scala:129)
at akka.remote.Remoting.start(Remoting.scala:191)
at
akka.remote.RemoteActorRefProvider.init(RemoteActorRefProvider.scala:184)
at akka.actor.ActorSystemImpl._start$lzycompute(ActorSystem.scala:579)
at akka.actor.ActorSystemImpl._start(ActorSystem.scala:577)
at akka.actor.ActorSystemImpl.start(ActorSystem.scala:588)
at akka.actor.ActorSystem$.apply(ActorSystem.scala:111)
at akka.actor.ActorSystem$.apply(ActorSystem.scala:104)
at 
org.apache.spark.util.AkkaUtils$.createActorSystem(AkkaUtils.scala:96)
at org.apache.spark.SparkEnv$.create(SparkEnv.scala:126)
at org.apache.spark.SparkContext.(SparkContext.scala:139)
at
org.apache.spark.api.java.JavaSparkContext.(JavaSparkContext.scala:47)
at com.sample.spark.NewCassandraTest.main(NewCassandraTest.java:66)
Caused by: java.util.concurrent.TimeoutException: Futures timed out after
[1 milliseconds]
at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219)
at 
scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)
at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107)
at
scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53)
at scala.concurrent.Await$.result(package.scala:107)
at akka.remote.Remoting.start(Remoting.scala:173)
... 11 more
]

I am clueless at the moment. Could you please let me know the issue. 

Thanks,
Sony 



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Cassandra-CQL-read-write-from-spark-using-Java-Remoting-Remoting-error-Startup-timed-out-tp2925.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.


Re: Cassandra CQL read/write from spark using Java - [Remoting] Remoting error: [Startup timed out]

2014-03-20 Thread sonyjv
Hi,

I managed to solve the issue. The problem was related to Netty. (Ref.
https://spark-project.atlassian.net/browse/SPARK-1138)

Changed the dependencies with Netty exclusions and included Netty 3.6.6. as
a dependency. 


org.apache.spark
spark-core_2.10
${spark.version}


io.netty
netty
3.6.6.Final


io.netty
netty-all
4.0.13.Final




org.apache.hadoop
hadoop-client
${hadoop.version}


io.netty
netty
3.2.4.Final


asm
asm
3.1


org.jboss.netty
netty
3.2.4.Final


 



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Cassandra-CQL-read-write-from-spark-using-Java-Remoting-Remoting-error-Startup-timed-out-tp2925p2933.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.


Streaming job having Cassandra query : OutOfMemoryError

2014-04-15 Thread sonyjv
Hi All,

I am desperately looking for some help.

My cluster is 6 nodes having dual core and 8GB ram each. Spark version
running on the cluster is spark-0.9.0-incubating-bin-cdh4.

I am getting OutOfMemoryError when running a Spark Streaming job
(non-streaming version works fine) which queries Cassandra table (simple
query returning 3-4 rows) by connecting to the Spark standalone cluster
master. 

java.lang.OutOfMemoryError: Java heap space
at
org.apache.hadoop.io.WritableUtils.readCompressedByteArray(WritableUtils.java:38)
at
org.apache.hadoop.io.WritableUtils.readCompressedString(WritableUtils.java:87)
at
org.apache.hadoop.io.WritableUtils.readCompressedStringArray(WritableUtils.java:185)
at 
org.apache.hadoop.conf.Configuration.readFields(Configuration.java:2244)
at 
org.apache.hadoop.io.ObjectWritable.readObject(ObjectWritable.java:280)
at 
org.apache.hadoop.io.ObjectWritable.readFields(ObjectWritable.java:75)
at
org.apache.spark.SerializableWritable.readObject(SerializableWritable.scala:39)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1017)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1893)
at
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
at
org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:40)
at 
org.apache.spark.broadcast.HttpBroadcast$.read(HttpBroadcast.scala:165)
at
org.apache.spark.broadcast.HttpBroadcast.readObject(HttpBroadcast.scala:56)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1017)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1893)
at
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
at 
java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
at
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
Apr 15, 2014 6:53:39 PM org.apache.spark.Logging$class logInfo

Spark job dependencies are 

 
org.scala-lang
scala-library
2.10.3


org.apache.spark
spark-core_2.10
0.9.0-incubating
 

org.apache.spark
spark-streaming_2.10
0.9.0-incubating


org.apache.cassandra
cassandra-all
2.0.6


com.tuplejump
calliope_2.10
0.9.0-U1-C2-EA


Various memory variables are configured as below. 
spark.executor.memory = 4g
SPARK_MEM = 2g
SPARK_WORKER_MEMORY = 4g

Can you you please let me know where am I going wrong. 

Thanks,
Sony





--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Streaming-job-having-Cassandra-query-OutOfMemoryError-tp4280.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.