How to expose Spark-Shell in the production?

2016-11-18 Thread kant kodali
How to expose Spark-Shell in the production? 1) Should we expose it on Master Nodes or Executer nodes? 2) Should we simple give access to those machines and Spark-Shell binary? what is the recommended way? Thanks!

Re: How do I convert json_encoded_blob_column into a data frame? (This may be a feature request)

2016-11-17 Thread kant kodali
out. > > On Wed, Nov 16, 2016 at 4:39 PM, kant kodali <kanth...@gmail.com> wrote: > >> 1. I have a Cassandra Table where one of the columns is blob. And this >> blob contains a JSON encoded String however not all the blob's across the >> Cassandra table for that co

Re: Configure spark.kryoserializer.buffer.max at runtime does not take effect

2016-11-17 Thread kant kodali
yeah I feel like this is a bug since you can't really modify the settings once you were given spark session or spark context. so the work around would be to use --conf. In your case it would be like this ./spark-shell --conf spark.kryoserializer.buffer.max=1g On Thu, Nov 17, 2016 at 1:59 PM,

How do I flatten JSON blobs into a Data Frame using Spark/Spark SQL

2016-11-17 Thread kant kodali
Hi All, I would like to flatten JSON blobs into a Data Frame using Spark/Spark SQl inside Spark-Shell. val df = spark.sql("select body from test limit 3"); // body is a json encoded blob column val df2 = df.select(df("body").cast(StringType).as("body")) when I do df2.show // shows the 3

Re: How do I flatten JSON blobs into a Data Frame using Spark/Spark SQL

2016-11-18 Thread kant kodali
This seem to work import org.apache.spark.sql._ val rdd = df2.rdd.map { case Row(j: String) => j } spark.read.json(rdd).show() However I wonder if this any inefficiency here ? since I have to apply this function for billion rows.

Re: How to use Spark SQL to connect to Cassandra from Spark-Shell?

2016-11-11 Thread kant kodali
Wait I cannot create CassandraSQLContext from spark-shell. is this only for enterprise versions? Thanks! On Fri, Nov 11, 2016 at 8:14 AM, kant kodali <kanth...@gmail.com> wrote: > https://academy.datastax.com/courses/ds320-analytics- > apache-spark/spark-sql-spark-sql-basics > &

Re: How to use Spark SQL to connect to Cassandra from Spark-Shell?

2016-11-11 Thread kant kodali
ad the document on https://github.com/datastax/spark-cassandra-connector > > > Yong > > > > ------ > *From:* kant kodali <kanth...@gmail.com> > *Sent:* Friday, November 11, 2016 11:04 AM > *To:* user @spark > *Subject:* How to use Spark SQL to

How to use Spark SQL to connect to Cassandra from Spark-Shell?

2016-11-11 Thread kant kodali
How to use Spark SQL to connect to Cassandra from Spark-Shell? Any examples ? I use Java 8. Thanks! kant

Re: How to use Spark SQL to connect to Cassandra from Spark-Shell?

2016-11-11 Thread kant kodali
https://academy.datastax.com/courses/ds320-analytics-apache-spark/spark-sql-spark-sql-basics On Fri, Nov 11, 2016 at 8:11 AM, kant kodali <kanth...@gmail.com> wrote: > Hi, > > This is spark-cassandra-connector > <https://github.com/datastax/spark-cassandra-connector>

Re: java.lang.OutOfMemoryError: unable to create new native thread

2016-10-30 Thread kant kodali
when I did this cat /proc/sys/kernel/pid_max I got 32768 On Sun, Oct 30, 2016 at 6:36 PM, kant kodali <kanth...@gmail.com> wrote: > I believe for ubuntu it is unlimited but I am not 100% sure (I just read > somewhere online). I ran ulimit -a and this is what I get > &

Re: java.lang.OutOfMemoryError: unable to create new native thread

2016-10-30 Thread kant kodali
i have no idea > > but i still suspecting the user, > as the user who run spark-submit is not necessary the pid for the JVM > process > > can u make sure when you "ps -ef | grep {your app id} " the PID is root? > On 10/31/16 11:21 AM, kant kodali wrote: > > The ja

Re: java.lang.OutOfMemoryError: unable to create new native thread

2016-10-30 Thread kant kodali
sponding user is busy > in other way > the jvm process will still not able to create new thread. > > btw the default limit for centos is 1024 > > > On 10/31/16 9:51 AM, kant kodali wrote: > > > On Sun, Oct 30, 2016 at 5:22 PM, Chan Chor Pang <chin...@indetail.co.

Re: java.lang.OutOfMemoryError: unable to create new native thread

2016-10-30 Thread kant kodali
On Sun, Oct 30, 2016 at 5:22 PM, Chan Chor Pang wrote: > /etc/security/limits.d/90-nproc.conf > Hi, I am using Ubuntu 16.04 LTS. I have this directory /etc/security/limits.d/ but I don't have any files underneath it. This error happens after running for 4 to 5 hours. I

Re: java.lang.OutOfMemoryError: unable to create new native thread

2016-10-30 Thread kant kodali
ur setting, spark job may execute by > other user. > > > On 10/31/16 10:38 AM, kant kodali wrote: > > when I did this > > cat /proc/sys/kernel/pid_max > > I got 32768 > > On Sun, Oct 30, 2016 at 6:36 PM, kant kodali <kanth...@gmail.com> wrote: > >> I believ

Re: java.lang.OutOfMemoryError: unable to create new native thread

2016-11-01 Thread kant kodali
Here is a UI of my thread dump. http://fastthread.io/my-thread-report.jsp?p=c2hhcmVkLzIwMTYvMTEvMS8tLWpzdGFja19kdW1wX3dpbmRvd19pbnRlcnZhbF8xbWluX2JhdGNoX2ludGVydmFsXzFzLnR4dC0tNi0xNy00Ng== On Mon, Oct 31, 2016 at 10:32 PM, kant kodali <kanth...@gmail.com> wrote: > Hi Vadim, > &g

Re: why spark driver program is creating so many threads? How can I limit this number?

2016-11-01 Thread kant kodali
Here is a UI of my thread dump. http://fastthread.io/my-thread-report.jsp?p=c2hhcmVkLzIwMTYvMTEvMS8tLWpzdG Fja19kdW1wX3dpbmRvd19pbnRlcnZhbF8xbWluX2JhdGNoX2ludGVydmFsXz FzLnR4dC0tNi0xNy00Ng== On Mon, Oct 31, 2016 at 7:10 PM, kant kodali <kanth...@gmail.com> wrote: > Hi Ryan, > &

Re: why spark driver program is creating so many threads? How can I limit this number?

2016-11-01 Thread kant kodali
This question looks very similar to mine but I don't see any answer. http://markmail.org/message/kkxhi5jjtwyadzxt On Mon, Oct 31, 2016 at 11:24 PM, kant kodali <kanth...@gmail.com> wrote: > Here is a UI of my thread dump. > > http://fastthread.io/my-thread-report.jsp?p=c

Re: java.lang.OutOfMemoryError: unable to create new native thread

2016-10-31 Thread kant kodali
me...@datadoghq.com> wrote: > Have you tried to get number of threads in a running process using `cat > /proc//status` ? > > On Sun, Oct 30, 2016 at 11:04 PM, kant kodali <kanth...@gmail.com> wrote: > >> yes I did run ps -ef | grep "app_name" and it is root.

Re: why spark driver program is creating so many threads? How can I limit this number?

2016-11-01 Thread kant kodali
queues in the ForkJoinPool. Thanks! On Tue, Nov 1, 2016 at 2:19 AM, Sean Owen <so...@cloudera.com> wrote: > Possibly https://issues.apache.org/jira/browse/SPARK-17396 ? > > On Tue, Nov 1, 2016 at 2:11 AM kant kodali <kanth...@gmail.com> wrote: > >> Hi Rya

Re: why spark driver program is creating so many threads? How can I limit this number?

2016-11-01 Thread kant kodali
@Sean It looks like this problem can happen with other RDD's as well. Not just unionRDD On Tue, Nov 1, 2016 at 2:52 AM, kant kodali <kanth...@gmail.com> wrote: > Hi Sean, > > The comments seem very relevant although I am not sure if this pull > request https://github.com/apach

Re: Custom receiver for WebSocket in Spark not working

2016-11-02 Thread kant kodali
I don't see a store() call in your receive(). Search for store() in here http://spark.apache.org/ docs/latest/streaming-custom-receivers.html On Wed, Nov 2, 2016 at 10:23 AM, Cassa L wrote: > Hi, > I am using spark 1.6. I wrote a custom receiver to read from WebSocket. > But

why spark driver program is creating so many threads? How can I limit this number?

2016-10-31 Thread kant kodali
when I do ps -elfT | grep "spark-driver-program.jar" | wc -l The result is around 32K. why does it create so many threads how can I limit this?

Re: why spark driver program is creating so many threads? How can I limit this number?

2016-10-31 Thread kant kodali
so many? On Mon, Oct 31, 2016 at 3:25 AM, Sean Owen <so...@cloudera.com> wrote: > ps -L [pid] is what shows threads. I am not sure this is counting what you > think it does. My shell process has about a hundred threads, and I can't > imagine why one would have thousands unless your ap

Re: why spark driver program is creating so many threads? How can I limit this number?

2016-10-31 Thread kant kodali
if the leak threads are in the driver side. > > Does it happen in the driver or executors? > > On Mon, Oct 31, 2016 at 12:20 PM, kant kodali <kanth...@gmail.com> wrote: > >> Hi Ryan, >> >> Ahh My Receiver.onStop method is currently empty. >> >> 1) I hav

Re: why spark driver program is creating so many threads? How can I limit this number?

2016-10-31 Thread kant kodali
e `jstack` to find out > the name of leaking threads? > > On Mon, Oct 31, 2016 at 12:35 PM, kant kodali <kanth...@gmail.com> wrote: > >> Hi Ryan, >> >> It happens on the driver side and I am running on a client mode (not the >> cluster mode). >> >>

How do I specify StorageLevel in KafkaUtils.createDirectStream?

2016-11-03 Thread kant kodali
JavaInputDStream> directKafkaStream = KafkaUtils.createDirectStream(ssc, LocationStrategies.PreferConsistent(), ConsumerStrategies.Subscribe(topics, kafkaParams));

Re: why spark driver program is creating so many threads? How can I limit this number?

2016-11-01 Thread kant kodali
ue, Nov 1, 2016 at 11:25 AM, kant kodali <kanth...@gmail.com> wrote: > >> AH!!! Got it! Should I use 2.0.1 then ? I don't see 2.1.0 >> >> On Tue, Nov 1, 2016 at 10:14 AM, Shixiong(Ryan) Zhu < >> shixi...@databricks.com> wrote: >> >>>

random idea

2016-11-02 Thread kant kodali
Hi Guys, I have a random idea and it would be great to receive some input. Can we have a HTTP2 Based receiver for Spark Streaming? I am wondering why not build micro services using Spark when needed? I can see it is not meant for that but I like to think it can be possible. To be more concrete,

Re: why spark driver program is creating so many threads? How can I limit this number?

2016-10-31 Thread kant kodali
y? > > This may depend on your driver program. Do you spawn any threads in > it? Could you share some more information on the driver program, spark > version and your environment? It would greatly help others to help you > > On Mon, Oct 31, 2016 at 3:47 AM, kant kodali <kanth...@gma

Re: why spark driver program is creating so many threads? How can I limit this number?

2016-10-31 Thread kant kodali
I am also under the assumption that *onStart *function of the Receiver is only called only once by Spark. please correct me if I am wrong. On Mon, Oct 31, 2016 at 11:35 AM, kant kodali <kanth...@gmail.com> wrote: > My driver program runs a spark streaming job. And it spawns a thread by

Re: why spark driver program is creating so many threads? How can I limit this number?

2016-10-31 Thread kant kodali
hich types of threads are leaking? > > On Mon, Oct 31, 2016 at 11:50 AM, kant kodali <kanth...@gmail.com> wrote: > >> I am also under the assumption that *onStart *function of the Receiver is >> only called only once by Spark. please correct me if I am wrong. >> &

Re: why spark driver program is creating so many threads? How can I limit this number?

2016-11-01 Thread kant kodali
AH!!! Got it! Should I use 2.0.1 then ? I don't see 2.1.0 On Tue, Nov 1, 2016 at 10:14 AM, Shixiong(Ryan) Zhu <shixi...@databricks.com > wrote: > Dstream "Window" uses "union" to combine multiple RDDs in one window into > a single RDD. > > On Tue, Nov

Re: ClassCastException while running a simple wordCount

2016-10-11 Thread kant kodali
;> launcher, driver and workers could lead to the bug you're seeing. A common >>> reason for a mismatch is if the SPARK_HOME environment variable is set. >>> This will cause the spark-submit script to use the launcher determined by >>> that environment variable, regardless of the

Re: Third party library

2016-11-26 Thread kant kodali
program in >> scala/java. >> >> Regards, >> Vineet >> >> On Sat, Nov 26, 2016 at 11:43 AM, kant kodali <kanth...@gmail.com> wrote: >> >>> Yes this is a Java JNI question. Nothing to do with Spark really. >>> >>> java.lan

few basic questions on structured streaming

2016-12-08 Thread kant kodali
Hi All, I read the documentation on Structured Streaming based on event time and I have the following questions. 1. what happens if an event arrives few days late? Looks like we have an unbound table with sorted time intervals as keys but I assume spark doesn't keep several days worth of data in

Do we really need mesos or yarn? or is standalone sufficent?

2016-12-16 Thread kant kodali
Do we really need mesos or yarn? or is standalone sufficient for production systems? I understand the difference but I don't know the capabilities of standalone cluster. does anyone have experience deploying standalone in the production?

theory question

2016-12-17 Thread kant kodali
Given a set of transformations does spark create multiple DAG's and picks the DAG by some metric such as say higher degree of concurrency or something else like the typical task graph model in parallel computing suggests? or does it simply builds one simple DAG by going through

Re: Wrting data from Spark streaming to AWS Redshift?

2016-12-11 Thread kant kodali
@shyla a side question: What does Redshift can do that Spark cannot do? Trying to understand your use case. On Fri, Dec 9, 2016 at 8:47 PM, ayan guha wrote: > Ideally, saving data to external sources should not be any different. give > the write options as stated in the

What can mesos or yarn do that spark standalone cannot do?

2017-01-15 Thread kant kodali
Hi, What can mesos or yarn do that spark standalone cannot do? Thanks!

Re: What benefits do we really get out of colocation?

2016-12-03 Thread kant kodali
hmm GCE pretty much seems to follow the same model as AWS. On Sat, Dec 3, 2016 at 1:22 AM, kant kodali <kanth...@gmail.com> wrote: > GCE seems to have better options. Any one had any experience with GCE? > > On Sat, Dec 3, 2016 at 1:16 AM, Manish Malhotra < > manish.ma

What benefits do we really get out of colocation?

2016-12-02 Thread kant kodali
I wonder what benefits do I really I get If I colocate my spark worker process and Cassandra server process on each node? I understand the concept of moving compute towards the data instead of moving data towards computation but It sounds more like one is trying to optimize for network latency.

Re: What benefits do we really get out of colocation?

2016-12-03 Thread kant kodali
Forgot to mention my entire cluster is on one DC. so if it is across multiple DC's then colocating does makes sense in theory as well. On Sat, Dec 3, 2016 at 1:12 AM, kant kodali <kanth...@gmail.com> wrote: > Thanks Sean! Just for the record I am currently seeing 95 MB/s RX (Receive >

Re: What benefits do we really get out of colocation?

2016-12-03 Thread kant kodali
colocation for various use cases. > > AWS emphmerial is not good for reliable storage file system, EBS is the > expensive alternative :) > > On Sat, Dec 3, 2016 at 1:12 AM, kant kodali <kanth...@gmail.com> wrote: > >> Thanks Sean! Just for the record I am currently seeing 95 M

Re: What benefits do we really get out of colocation?

2016-12-03 Thread kant kodali
tion time is longer > > Le 3 déc. 2016 7:39 AM, "kant kodali" <kanth...@gmail.com> a écrit : > >> >> I wonder what benefits do I really I get If I colocate my spark worker >> process and Cassandra server process on each node? >> >> I understand the

Re: What benefits do we really get out of colocation?

2016-12-03 Thread kant kodali
ugh that's not as > much an issue in this context. > > On Sat, Dec 3, 2016 at 8:42 AM kant kodali <kanth...@gmail.com> wrote: > >> wait, how is that a benefit? isn't that a bad thing if you are saying >> colocating leads to more latency and overall execution

Re: What benefits do we really get out of colocation?

2016-12-03 Thread kant kodali
ephemeral storage on ssd will be very painful to maintain especially with large datasets. we will pretty soon have somewhere in PB. I am thinking to leverage something like below. But not sure how much performance gain we could get out of that. https://github.com/stec-inc/EnhanceIO On Sat, Dec

java.lang.Exception: Could not compute split, block input-0-1480539568000 not found

2016-11-30 Thread kant kodali
I have lot of these exceptions happening java.lang.Exception: Could not compute split, block input-0-1480539568000 not found Any ideas what this could be?

quick question

2016-12-01 Thread kant kodali
Assume I am running a Spark Client Program in client mode and Spark Cluster in Stand alone mode. I want some clarification of the following things 1. Build a DAG 2. DAG Scheduler 3. TASK Scheduler I want to which of the above part is done by SPARK CLIENT and which of the above parts are done by

Can I have two different receivers for my Spark client program?

2016-11-30 Thread kant kodali
HI All, I am wondering if it makes sense to have two receivers inside my Spark Client program? The use case is as follows. 1) We have to support a feed from Kafka so this will be a direct receiver #1. We need to perform batch inserts from kafka feed to Cassandra. 2) an gRPC receiver where we

Re: How do I flatten JSON blobs into a Data Frame using Spark/Spark SQL

2016-12-05 Thread kant kodali
= df.select(df("body").cast(StringType).as("body")) > val df2 = Seq("""{"a": 1}""").toDF("body") > val schema = spark.read.json(df2.as[String].rdd).schema > df2.select(from_json(col("body"), schema)).show() > &

Re: How do I flatten JSON blobs into a Data Frame using Spark/Spark SQL

2016-12-05 Thread kant kodali
use to build the static schema code automatically > <https://databricks-prod-cloudfront.cloud.databricks.com/public/4027ec902e239c93eaaa8714f173bcfc/1023043053387187/1128172975083446/2840265927289860/latest.html> > . > > Would that work for you? If not, why not? > > On Wed, Nov 23, 2016 at 2:48 A

Re: java.lang.Exception: Could not compute split, block input-0-1480539568000 not found

2016-11-30 Thread kant kodali
stringIntegerJavaPairRDD .collect() .forEach((Tuple2<String, Long> KV) -> { String status = KV._1(); Long count = KV._2(); map.put(status, count);

Re: java.lang.Exception: Could not compute split, block input-0-1480539568000 not found

2016-11-30 Thread kant kodali
Bytes()); } }); Thanks, kant On Wed, Nov 30, 2016 at 2:11 PM, Marco Mistroni <mmistr...@gmail.com> wrote: > Could you paste reproducible snippet code? > Kr > > On 30 Nov 2016 9:08 pm, "kant kodali" <kanth...@gmail.com> wrote: > >> I have lot of these exceptions happening >> >> java.lang.Exception: Could not compute split, block input-0-1480539568000 >> not found >> >> >> Any ideas what this could be? >> >

Re: java.lang.Exception: Could not compute split, block input-0-1480539568000 not found

2016-12-01 Thread kant kodali
I also use this super(StorageLevel.MEMORY_AND_DISK_2()); inside my receiver On Wed, Nov 30, 2016 at 10:44 PM, kant kodali <kanth...@gmail.com> wrote: > Here is another transformation that might cause the error but it has to be > one of these two since I only have two tra

Re: java.lang.Exception: Could not compute split, block input-0-1480539568000 not found

2016-12-01 Thread kant kodali
? is it a disk block ? if so, what is it default size? and Finally, why does the following error happens so often? java.lang.Exception: Could not compute split, block input-0-1480539568000 not found On Thu, Dec 1, 2016 at 12:42 AM, kant kodali <kanth...@gmail.com> wrote: > I also use t

Re: java.lang.Exception: Could not compute split, block input-0-1480539568000 not found

2016-12-01 Thread kant kodali
worker instance) On Thu, Dec 1, 2016 at 12:55 AM, kant kodali <kanth...@gmail.com> wrote: > My batch interval is 1s > slide interval is 1s > window interval is 1 minute > > I am using a standalone alone cluster. I don't have any storage layer like > HDFS. so I dont k

Re: Gradle dependency problem with spark

2016-12-21 Thread kant kodali
ens to have the functionality that both dependencies want, and hope > that exists. Spark should shade Guava at this point but doesn't mean that > you won't hit this problem from transitive dependencies. > > On Fri, Dec 16, 2016 at 11:05 AM kant kodali <kanth...@gmail.com> wrote: &

Re: Gradle dependency problem with spark

2016-12-16 Thread kant kodali
AM, kant kodali <kanth...@gmail.com> wrote: > Hi Guys, > > Here is the simplified version of my problem. I have the following problem > and I new to gradle > > > dependencies { > compile group: 'org.apache.spark', name: 'spark-core_2.11', version:

Gradle dependency problem with spark

2016-12-16 Thread kant kodali
Hi Guys, Here is the simplified version of my problem. I have the following problem and I new to gradle dependencies { compile group: 'org.apache.spark', name: 'spark-core_2.11', version: '2.0.2' compile group: 'com.github.brainlag', name: 'nsq-client', version: '1.0.0.RC2' } I took

Re: Do we really need mesos or yarn? or is standalone sufficent?

2016-12-16 Thread kant kodali
ll you > need more. Otherwise, try YARN or MESOS depending on the rest of your > components. > > > > 2cents > > > > Saif > > > > *From:* kant kodali [mailto:kanth...@gmail.com] > *Sent:* Friday, December 16, 2016 3:14 AM > *To:* user @spark > *Subject:* Do

question on DStreams

2017-03-28 Thread kant kodali
Hi All, I have the following question. Imagine there is a DStream of JSON strings coming in and I apply few different filters in parallel on the same DStream (so these filters are not applied one after the other). For Example here is the Pseudo code if that helps dstream.filter(x -> { check for

What CI tool does Databricks use?

2017-03-26 Thread kant kodali
What CI tool does Databricks use?

is there a way to persist the lineages generated by spark?

2017-04-03 Thread kant kodali
Hi All, I am wondering if there a way to persist the lineages generated by spark underneath? Some of our clients want us to prove if the result of the computation that we are showing on a dashboard is correct and for that If we can show the lineage of transformations that are executed to get to

Why do we ever run out of memory in Spark Structured Streaming?

2017-04-04 Thread kant kodali
Why do we ever run out of memory in Spark Structured Streaming especially when Memory can always spill to disk ? until the disk is full we shouldn't be out of memory.isn't it? sure thrashing will happen more frequently and degrades performance but we do we ever run out Memory even in case of

Re: Why do we ever run out of memory in Spark Structured Streaming?

2017-04-05 Thread kant kodali
-guide.html#handling-late-data-and-watermarking > Monitoring - http://spark.apache.org/docs/latest/structured- > streaming-programming-guide.html#monitoring-streaming-queries > > In case you were referring to something else, please give us more context > details - what query, what are the

Re: Is the trigger interval the same as batch interval in structured streaming?

2017-04-10 Thread kant kodali
uery that would produce the answer you > want. Structured streaming will figure out an efficient way to compute > that answer incrementally as new data arrives. > > On Mon, Apr 10, 2017 at 12:20 PM, kant kodali <kanth...@gmail.com> wrote: > >> Hi Michael, >> >> T

Re: Is the trigger interval the same as batch interval in structured streaming?

2017-04-10 Thread kant kodali
che.org/docs/latest/structured-streaming-programming-guide.html#window-operations-on-event-time> > function. > > On Thu, Apr 6, 2017 at 10:26 AM, kant kodali <kanth...@gmail.com> wrote: > >> Hi All, >> >> Is the trigger interval mentioned in this doc >> <h

Re: Is the trigger interval the same as batch interval in structured streaming?

2017-04-10 Thread kant kodali
re - https://github.com/apache/ > spark/blob/master/examples/src/main/scala/org/apache/ > spark/examples/sql/streaming/StructuredNetworkWordCountWindowed.scala > > > On Mon, Apr 10, 2017 at 12:55 PM, kant kodali <kanth...@gmail.com> wrote: > >> Thanks again! Looks like the update mod

Does Apache Spark use any Dependency Injection framework?

2017-04-02 Thread kant kodali
Hi All, I am wondering if can get SparkConf object through Dependency Injection? I currently use HOCON library to store all key/value pairs required to

Is the trigger interval the same as batch interval in structured streaming?

2017-04-06 Thread kant kodali
Hi All, Is the trigger interval mentioned in this doc the same as batch interval in structured streaming? For example I have a long running receiver(not kafka) which sends me a real time stream I want to use window

Re: Hi

2017-04-07 Thread kant kodali
oops sorry. Please ignore this. wrong mailing list

Hi

2017-04-07 Thread kant kodali
Hi All, I read the docs however I still have the following question For Stateful stream processing is HDFS mandatory? because In some places I see it is required and other places I see that rocksDB can be used. I just want to know if HDFS is mandatory for Stateful stream processing? Thanks!

Is checkpointing in Spark Streaming Synchronous or Asynchronous ?

2017-04-07 Thread kant kodali
Hi All, Is checkpointing in Spark Streaming Synchronous or Asynchronous ? other words can spark continue processing the stream while checkpointing? Thanks!

Re: Why do we ever run out of memory in Spark Structured Streaming?

2017-04-05 Thread kant kodali
reset it back to zero. so any advice on how to best approach this scenario? Thanks much! On Wed, Apr 5, 2017 at 12:39 AM, kant kodali <kanth...@gmail.com> wrote: > Hi! > > I am talking about "stateful operations like aggregations". Does this > happen on heap or off he

Re: Why do we ever run out of memory in Spark Structured Streaming?

2017-04-05 Thread kant kodali
Actually I want to reset my counters every 24 hours then shouldn't the window and slide interval = 24 hours. If so, how do I send updates to real time dashboard every second? isn't the trigger interval is the same as slide interval ? On Wed, Apr 5, 2017 at 7:17 AM, kant kodali <kanth...@gmail.

Re: Is checkpointing in Spark Streaming Synchronous or Asynchronous ?

2017-04-11 Thread kant kodali
all your state every time. And we will be making this > asynchronous soon. > > On Fri, Apr 7, 2017 at 3:19 AM, kant kodali <kanth...@gmail.com> wrote: > >> Hi All, >> >> Is checkpointing in Spark Streaming Synchronous or Asynchronous ? other >> words can

Problem with Java and Scala interoperability // streaming

2017-04-19 Thread kant kodali
Hi All, I get the following errors whichever way I try either lambda or generics. I am using spark 2.1 and scalla 2.11.8 StreamingContext ssc = StreamingContext.getOrCreate(hdfsCheckpointDir, () -> {return createStreamingContext();}, null, false); ERROR StreamingContext ssc =

Re: Problem with Java and Scala interoperability // streaming

2017-04-19 Thread kant kodali
oops my bad. I see it now! sorry. On Wed, Apr 19, 2017 at 1:56 PM, Marcelo Vanzin <van...@cloudera.com> wrote: > I see a bunch of getOrCreate methods in that class. They were all > added in SPARK-6752, a long time ago. > > On Wed, Apr 19, 2017 at 1:51 PM, kant kodali <kant

Re: Problem with Java and Scala interoperability // streaming

2017-04-19 Thread kant kodali
works now! thanks much! On Wed, Apr 19, 2017 at 2:05 PM, kant kodali <kanth...@gmail.com> wrote: > oops my bad. I see it now! sorry. > > On Wed, Apr 19, 2017 at 1:56 PM, Marcelo Vanzin <van...@cloudera.com> > wrote: > >> I see a bunch of getOrCreate methods in t

Re: Problem with Java and Scala interoperability // streaming

2017-04-19 Thread kant kodali
e writing Java? > > On Wed, Apr 19, 2017 at 1:42 PM, kant kodali <kanth...@gmail.com> wrote: > > Hi All, > > > > I get the following errors whichever way I try either lambda or > generics. I > > am using > > spark 2.1 and scalla 2.11.8 > >

Spark Jobs filling up the disk at SPARK_LOCAL_DIRS location

2017-03-09 Thread kant kodali
Hi All, My spark streaming jobs are filling up the disk within a short amount of time < 10 mins. I have a disk space of 10GB and it is getting full at SPARK_LOCAL_DIRS location. In my case SPARK_LOCAL_DIRS is set to /usr/local/spark/temp. There are lot of files like this input-0-1489072623600

question on Write Ahead Log (Spark Streaming )

2017-03-08 Thread kant kodali
Hi All, I am using a Receiver based approach. And I understand that spark streaming API's will convert the received data from receiver into blocks and these blocks that are in memory are also stored in WAL if one enables it. my upstream source which is not Kafka can also replay by which I mean if

How to unit test spark streaming?

2017-03-07 Thread kant kodali
Hi All, How to unit test spark streaming or spark in general? How do I test the results of my transformations? Also, more importantly don't we need to spawn master and worker JVM's either in one or multiple nodes? Thanks! kant

Re: [ANNOUNCE] Apache Bahir 2.1.0 Released

2017-03-05 Thread kant kodali
How about HTTP2/REST connector for Spark? Is that something we can expect? Thanks! On Wed, Feb 22, 2017 at 4:07 AM, Christian Kadner wrote: > The Apache Bahir community is pleased to announce the release > of Apache Bahir 2.1.0 which provides the following extensions for >

Re: How to unit test spark streaming?

2017-03-07 Thread kant kodali
Agreed with the statement in quotes below whether one wants to do unit tests or not It is a good practice to write code that way. But I think the more painful and tedious task is to mock/emulate all the nodes such as spark workers/master/hdfs/input source stream and all that. I wish there is

Re: What is the difference between forEachAsync vs forEachPartitionAsync?

2017-04-02 Thread kant kodali
to the lambda respectively. On Sun, Apr 2, 2017 at 8:36 PM, kant kodali <kanth...@gmail.com> wrote: > Hi all, > > What is the difference between forEachAsync vs forEachPartitionAsync? I > couldn't find any comments from the Javadoc. If I were to guess here is > what I would say but p

What is the difference between forEachAsync vs forEachPartitionAsync?

2017-04-02 Thread kant kodali
Hi all, What is the difference between forEachAsync vs forEachPartitionAsync? I couldn't find any comments from the Javadoc. If I were to guess here is what I would say but please correct me if I am wrong. forEachAsync just iterate through values from all partitions one by one in an Async Manner

Apache Drill vs Spark SQL

2017-04-06 Thread kant kodali
Hi All, I am very impressed with the work done on Spark SQL however when I have to pick something to serve real time queries I am in a dilemma for the following reasons. 1. Even though Spark Sql has logical plans, physical plans and run time code generation and all that it still doesn't look

Re: is there a way to persist the lineages generated by spark?

2017-04-07 Thread kant kodali
client > describes a calculation, but in the end the description is wrong. > > > On 4. Apr 2017, at 05:19, kant kodali <kanth...@gmail.com> wrote: > > > > Hi All, > > > > I am wondering if there a way to persist the lineages generated by spark >

Questions on HDFS with Spark

2017-04-18 Thread kant kodali
Hi All, I've been using spark standalone for a while and now its time for me to install HDFS. If a spark worker goes down then Spark master restarts the worker similarly if a datanode process goes down it looks like it is not the namenode job to restart the datanode and if so, 1) should I use

Does Spark SQL uses Calcite?

2017-08-10 Thread kant kodali
Hi All, Does Spark SQL uses Calcite? If so, what for? I thought the Spark SQL has catalyst which would generate its own logical plans, physical plans and other optimizations. Thanks, Kant

Re: Does Spark SQL uses Calcite?

2017-08-10 Thread kant kodali
Since I see a calcite dependency in Spark I wonder where Calcite is being used? On Thu, Aug 10, 2017 at 1:30 PM, Sathish Kumaran Vairavelu < vsathishkuma...@gmail.com> wrote: > Spark SQL doesn't use Calcite > > On Thu, Aug 10, 2017 at 3:14 PM kant kodali <kanth...@gmail.com&g

Re: Does Spark SQL uses Calcite?

2017-08-11 Thread kant kodali
@Ryan it looks like if I enable thrift server I need to go through hive. I was talking more about having JDBC connector for Spark SQL itself other words not going through hive. On Fri, Aug 11, 2017 at 6:50 PM, kant kodali <kanth...@gmail.com> wrote: > @Ryan Does it work with Spark

Re: Does Spark SQL uses Calcite?

2017-08-11 Thread kant kodali
>> On Aug 10, 2017, at 2:24 PM, Sathish Kumaran Vairavelu < >> vsathishkuma...@gmail.com> wrote: >> >> I think it is for hive dependency. >> On Thu, Aug 10, 2017 at 4:14 PM kant kodali <kanth...@gmail.com> wrote: >> >>> Since I see a ca

Re: [ANNOUNCE] Announcing Apache Spark 2.2.0

2017-07-17 Thread kant kodali
+1 On Tue, Jul 11, 2017 at 3:56 PM, Jean Georges Perrin wrote: > Awesome! Congrats! Can't wait!! > > jg > > > On Jul 11, 2017, at 18:48, Michael Armbrust > wrote: > > Hi all, > > Apache Spark 2.2.0 is the third release of the Spark 2.x line. This > release

Is there a way to run Spark SQL through REST?

2017-07-22 Thread kant kodali
Is there a way to run Spark SQL through REST?

some Ideas on expressing Spark SQL using JSON

2017-07-25 Thread kant kodali
Hi All, I am thinking to express Spark SQL using JSON in the following the way. For Example: *Query using Spark DSL* DS.filter(col("name").equalTo("john")) .groupBy(functions.window(df1.col("TIMESTAMP"), "24 hours", "24 hours"), df1.col("hourlyPay"))

Re: What are some disadvantages of issuing a raw sql query to spark?

2017-07-25 Thread kant kodali
al plan from the raw > sql (or DSL) and optimize on that. Ideally you would end up with the same > physical plan, irrespective of it been written in raw sql / DSL. > > Regards, > Keith. > > http://keith-chapman.com > > On Tue, Jul 25, 2017 at 12:50 AM, kant kodali

What are some disadvantages of issuing a raw sql query to spark?

2017-07-25 Thread kant kodali
HI All, I just want to run some spark structured streaming Job similar to this DS.filter(col("name").equalTo("john")) .groupBy(functions.window(df1.col("TIMESTAMP"), "24 hours", "24 hours"), df1.col("hourlyPay")) .agg(sum("hourlyPay").as("total")); I am wondering if I can

Re: Is there a way to run Spark SQL through REST?

2017-07-23 Thread kant kodali
. On Sat, Jul 22, 2017 at 6:19 AM, Sumedh Wale <sw...@snappydata.io> wrote: > On Saturday 22 July 2017 01:31 PM, kant kodali wrote: > >> Is there a way to run Spark SQL through REST? >> > > There is spark-jobserver (https://github.com/spark-jobs > erver/spark-jobserver)

<    1   2   3   4   5   >