both cases, I would expect the remote spark app to be killed and my
>> local process to be killed.
>>
>> Why is this happening? and how can I kill a spark app from the terminal
>> launced via shell script w.o going to the spark master UI?
>>
>> I want to launch the spark app via script and log the pid so i can
>> monitor it remotely
>>
>> thanks for the help
>>
>>
>
--
*VARUN SHARMA*
*Flipkart*
*Bangalore*
efault
>
>
> It looks like that I cannot access the dynamic passed in value from the
> command line this way.
>
> In the Hadoop, the Configuration object will include all the passed in
> key/value in the application. How to archive that in Spark?
>
> Thanks
>
> Yong
>
>
--
*VARUN SHARMA*
*Flipkart*
*Bangalore*
in the ability to
>> scale out processing beyond your number of partitions.
>>
>> We’re doing this to scale up from 36 partitions / topic to 140 partitions
>> (20 cores * 7 nodes) and it works great.
>>
>> -adrian
>>
>> From: varun sharma
>> Date:
number of tasks created were independent of kafka partitions, I need
something like that only.
Any config available if I dont need one to one semantics?
Is there any way I can repartition without incurring any additional cost.
Thanks
*VARUN SHARMA*
t;> is job for offsets 20-30, and sent it to executor.
>> - executor does all the steps (if there is only one stage) and saves
>> offset 30 to zookeeper.
>>
>> This way I loose data in offsets 10-20
>>
>> How should this be handled correctly?
>>
>> пн, 26 о
next batch. That
> is not desirable for me.
> It works like this because JobScheduler is an actor and after it reports
> error, it goes on with next message that starts next batch job. While
> ssc.awaitTermination() works in another thread and happens after next batch
> starts.
>
olks,
>>
>>
>>
>> I have a very large number of Kafka topics (many thousands of partitions)
>> that I want to consume, filter based on topic-specific filters, then
>> produce back to filtered topics in Kafka.
>>
>>
>>
>> Using the receiver-less based approach with Spark 1.4.1 (described here
>> <https://github.com/koeninger/kafka-exactly-once/blob/master/blogpost.md>)
>> I am able to use either KafkaUtils.createDirectStream or
>> KafkaUtils.createRDD, consume from many topics, and filter them with the
>> same filters but I can't seem to wrap my head around how to apply
>> topic-specific filters, or to finally produce to topic-specific destination
>> topics.
>>
>>
>>
>> Another point would be that I will need to checkpoint the metadata after
>> each successful batch and set starting offsets per partition back to ZK. I
>> expect I can do that on the final RDDs after casting them accordingly, but
>> if anyone has any expertise/guidance doing that and is willing to share,
>> I'd be pretty grateful.
>>
>>
>>
>
>
--
*VARUN SHARMA*
*Flipkart*
*Bangalore*
s not update correctly.
>
> On Tue, Oct 20, 2015 at 8:38 AM, varun sharma <varunsharman...@gmail.com>
> wrote:
>
>> Also, As you can see the timestamps in attached image. batches coming
>> after the Cassandra server comes up(21:04) are processed and batches which
>&
>
> TD
>
> On Mon, Oct 19, 2015 at 5:48 AM, varun sharma <varunsharman...@gmail.com>
> wrote:
>
>> Hi,
>> I am facing this issue consistently in spark-cassandra-kafka *streaming
>> job.*
>> *Spark 1.4.0*
>> *cassandra connector 1.4.0-M3*
>>
at would
> improve your particular usecase and vote for it if so :-)
>
> -kr, Gerard.
>
> On Sat, Oct 3, 2015 at 3:53 PM, varun sharma <varunsharman...@gmail.com>
> wrote:
>
>> Thanks Gerardthe code snippet you shared worked.. but can you please
>>
}
>
>
> I'm wondering: would something like this (
> https://datastax-oss.atlassian.net/browse/SPARKC-257) better fit your
> purposes?
>
> -kr, Gerard.
>
> On Fri, Oct 2, 2015 at 8:12 PM, varun sharma <varunsharman...@gmail.com>
> wrote:
>
>> Hi Adrian,
>
lit the RDD consisting on N topics into N RDD's
>> each having 1 topic.
>>
>> Any help would be appreciated.
>>
>> Thanks in advance,
>> Udit
>>
>
>
--
*VARUN SHARMA*
*Flipkart*
*Bangalore*
gt; each topic since I need to process data differently depending on the topic.
>> I basically want to split the RDD consisting on N topics into N RDD's
>> each having 1 topic.
>>
>> Any help would be appreciated.
>>
>> Thanks in advance,
>> Udit
>>
>
>
--
*VARUN SHARMA*
*Flipkart*
*Bangalore*
ose that data which is not yet scheduled? *
--
*VARUN SHARMA*
*Flipkart*
*Bangalore*
My workers are going OOM over time. I am running a streaming job in spark
1.4.0.
Here is the heap dump of workers.
/16,802 instances of "org.apache.spark.deploy.worker.ExecutorRunner", loaded
by "sun.misc.Launcher$AppClassLoader @ 0xdff94088" occupy 488,249,688
(95.80%) bytes. These instances are
This works for me:
export MAVEN_OPTS=-Xmx2g -XX:MaxPermSize=512M
-XX:ReservedCodeCacheSize=512m mvn -DskipTests clean package
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Installation-Maven-PermGen-OutOfMemoryException-tp20831p20868.html
Sent
Hi Kevin,
Were you able to build spark with command export MAVEN_OPTS=-Xmx2g
-XX:MaxPermSize=512M -XX:ReservedCodeCacheSize=512m mvn -Pdeb
-DskipTests clean package ?
I am getting the below error for all versions of spark(even 1.2.0):
Failed to execute goal org.vafer:jdeb:0.11:jdeb (default) on
17 matches
Mail list logo