unsubscribe

2017-04-13 Thread tian zhang

Re: Spark streaming checkpoint against s3

2015-10-15 Thread Tian Zhang
So as long as jar is kept on s3 and available across different runs, then the s3 checkpoint is working. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-streaming-checkpoint-against-s3-tp25068p25081.html Sent from the Apache Spark User List mailing

Spark streaming checkpoint against s3

2015-10-14 Thread Tian Zhang
Hi, I am trying to set spark streaming checkpoint to s3, here is what I did basically val checkpoint = "s3://myBucket/checkpoint" val ssc = StreamingContext.getOrCreate(checkpointDir, () => getStreamingContext(sparkJobName,

Re: Spark streaming checkpoint against s3

2015-10-14 Thread Tian Zhang
It looks like that reconstruction of SparkContext from checkpoint data is trying to look for the jar file of previous failed runs. It can not find the jar files as our jar files are on local machines and were cleaned up after each failed run. -- View this message in context:

Re: updateStateByKey and stack overflow

2015-10-13 Thread Tian Zhang
It turns out that our hdfs checkpoint failed, but spark streaming is running and building up a long lineage ... -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/updateStateByKey-and-stack-overflow-tp25015p25054.html Sent from the Apache Spark User List

Re: "Too many open files" exception on reduceByKey

2015-10-11 Thread Tian Zhang
It turns out the mesos can overwrite the OS ulimit -n setting. So we have increased the mesos slave ulimit -n setting. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Too-many-open-files-exception-on-reduceByKey-tp2462p25019.html Sent from the Apache Spark

updateStateByKey and stack overflow

2015-10-10 Thread Tian Zhang
Hi, I am following the spark streaming stateful application example and write a simple counting application with updateStateByKey. val keyStateStream = actRegBatchCountStream.updateStateByKey(update, new HashPartitioner(ssc.sparkContext.defaultParallelism), true, initKeyStateRDD) This runs

Re: "Too many open files" exception on reduceByKey

2015-10-09 Thread tian zhang
//www.dbtsai.com PGP Key ID: 0xAF08DF8D On Thu, Oct 8, 2015 at 3:22 PM, Tian Zhang <tzhang...@yahoo.com> wrote: I hit this issue with spark 1.3.0 stateful application (with updateStateByKey) function on mesos.  It will fail after running fine for about 24 hours. The error stack trace as belo

Re: "Too many open files" exception on reduceByKey

2015-10-08 Thread Tian Zhang
I hit this issue with spark 1.3.0 stateful application (with updateStateByKey) function on mesos. It will fail after running fine for about 24 hours. The error stack trace as below, I checked ulimit -n and we have very large numbers set on the machines. What else can be wrong? 15/09/27 18:45:11

how to pass configuration properties from driver to executor?

2015-04-30 Thread Tian Zhang
Hi, We have a scenario as below and would like your suggestion. We have app.conf file with propX=A as default built into the fat jar file that is provided to spark-submit WE have env.conf file with propX=B that would like spark-submit to take as input to overwrite the default and populate to

Re: Lifecycle of RDD in spark-streaming

2014-11-26 Thread tian zhang
I have found this paper seems to answer most of questions about life duration.https://www.cs.berkeley.edu/~matei/papers/2012/hotcloud_spark_streaming.pdf Tian On Tuesday, November 25, 2014 4:02 AM, Mukesh Jha me.mukesh@gmail.com wrote: Hey Experts, I wanted to understand in

2 spark streaming questions

2014-11-23 Thread tian zhang
Hi, Dear Spark Streaming Developers and Users, We are prototyping using spark streaming and hit the following 2 issues thatI would like to seek your expertise. 1) We have a spark streaming application in scala, that reads  data from Kafka intoa DStream, does some processing and output a

Re: spark streaming and the spark shell

2014-11-19 Thread Tian Zhang
I am hitting the same issue, i.e., after running for some time, if spark streaming job lost or timeout kafka connection, it will just start to return empty RDD's .. Is there a timeline for when this issue will be fixed so that I can plan accordingly? Thanks. Tian -- View this message in

Re: spark 1.1.0/yarn hang

2014-10-22 Thread Tian Zhang
We have narrowed this hanging issue down to the calliope package that we used to create RDD from reading cassandra table. The calliope native RDD interface seems hanging and I have decided to switch to the calliope cql3 RDD interface. -- View this message in context:

spark 1.1.0 RDD and Calliope 1.1.0-CTP-U2-H2

2014-10-21 Thread Tian Zhang
Hi, I am using the latest calliope library from tuplejump.com to create RDD for cassandra table. I am on a 3 nodes spark 1.1.0 with yarn. My cassandra table is defined as below and I have about 2000 rows of data inserted. CREATE TABLE top_shows ( program_id varchar, view_minute timestamp,

spark 1.1.0/yarn hang

2014-10-14 Thread tian zhang
Hi, I have spark 1.1.0 yarn installation. I am using spark-submit to run a simple application. From the console output, I have 769 partitions and after task 768 in stage 0 (count) finished, it hangs. I used jstack to dump the stacktop and it shows it is waiting ... Any suggestion what might go

Re: Spark Streaming : Could not compute split, block not found

2014-10-09 Thread Tian Zhang
I have figured out why I am getting this error: We have a lot of data in kafka and the DStream from Kafka used MEMROY_ONLY_SER, so once the memory is low, spark started to discard data that is needed later ... So once I change to MEMORY_AND_DISK_SER, the error is gone. Tian -- View this

Re: [ANN] SparkSQL support for Cassandra with Calliope

2014-10-06 Thread tian zhang
with version 1.1.0-CTP-U2-H2 Let us know how your testing goes. Regards, Rohit Founder CEO, Tuplejump, Inc. www.tuplejump.comThe Data Engineering Platform On Sat, Oct 4, 2014 at 3:49 AM, tian zhang tzhang...@yahoo.com wrote: Hi, Rohit, Thank you for sharing

Spark 1.1.0 (w/ hadoop 2.4) versus aws-java-sdk-1.7.2.jar

2014-09-19 Thread tian zhang
Hi, Spark experts, I have the following issue when using aws java sdk in my spark application. Here I narrowed down the following steps to reproduce the problem 1) I have Spark 1.1.0 with hadoop 2.4 installed on 3 nodes cluster 2) from the master node, I did the following steps. spark-shell