("meta-inf/services/") =>
MergeStrategy.filterDistinctLines
case "reference.conf"=>
MergeStrategy.concat
case _ =>
MergeStrategy.first
}
Thanks & Regards,
Vinti
On Wed, Feb
t state and what is new data.
>
>
>
> Assuming it works like the Java API, to use this function to maintain
> State you must call State.update, while you can return anything, not just
> the state.
>
>
>
> Cheers
>
> Iain
>
>
>
> *From:* Vinti Maheshw
Hi All,
I wanted to replace my updateStateByKey function with mapWithState function
(Spark 1.6) to improve performance of my program.
I was following these two documents:
https://databricks.com/blog/2016/02/01/faster-stateful-stream-processing-in-spark-streaming.html
I have 2 machines in my cluster with the below specifications:
128 GB RAM and 8 cores machine
Regards,
~Vinti
On Sun, Mar 6, 2016 at 7:54 AM, Vinti Maheshwari <vinti.u...@gmail.com>
wrote:
> Thanks Supreeth and Shahbaz. I will try adding
> spark.streaming.kafka.maxRatePerParti
tting spark.streaming.kafka.maxRatePerPartition, this can help
>> control the number of messages read from Kafka per partition on the spark
>> streaming consumer.
>>
>> -S
>>
>>
>> On Mar 5, 2016, at 10:02 PM, Vinti Maheshwari <vinti.u...@gmail.com>
Hello,
I am trying to figure out why my kafka+spark job is running slow. I found
that spark is consuming all the messages out of kafka into a single batch
itself and not sending any messages to the other batches.
2016/03/05 21:57:05
shixi...@databricks.com
> wrote:
> Hey,
>
> KafkaUtils.createDirectStream doesn't need a StorageLevel as it doesn't
> store blocks to BlockManager. However, the error is not related
> to StorageLevel. It may be a bug. Could you provide more info about it?
> E.g., Spark version,
Increasing Spark_executors_instances to 4 worked.
SPARK_EXECUTOR_INSTANCES="4" #Number of workers to start (Default: 2)
Regards,
Vinti
On Wed, Mar 2, 2016 at 4:28 AM, Vinti Maheshwari <vinti.u...@gmail.com>
wrote:
> Thanks much Saisai. Got it.
> So i think increasing
ire to store the input data ahead of time. Only receiver-based
> approach could specify the storage level.
>
> Thanks
> Saisai
>
> On Wed, Mar 2, 2016 at 7:08 PM, Vinti Maheshwari <vinti.u...@gmail.com>
> wrote:
>
>> Hi All,
>>
>> I wanted to set *Storag
Hi All,
I wanted to set *StorageLevel.MEMORY_AND_DISK_SER* in my spark-streaming
program as currently i am getting
MetadataFetchFailedException*. *I am not sure where i should pass
StorageLevel.MEMORY_AND_DISK, as it seems like createDirectStream doesn't
allow to pass that parameter.
val
Hi All,
I wanted to set *StorageLevel.MEMORY_AND_DISK_SER* in my spark-streaming
program as currently i am getting
MetadataFetchFailedException*. *I am not sure where i should pass
StorageLevel.MEMORY_AND_DISK, as it seems like createDirectStream doesn't
allow to pass that parameter.
val
may have better luck
> on a perl or kafka related list.
>
> On Mon, Feb 29, 2016 at 3:26 PM, Vinti Maheshwari <vinti.u...@gmail.com>
> wrote:
>
>> Hi All,
>>
>> I wrote kafka producer using kafka perl api, But i am getting error when
>> i am passing
Thanks much Amit, Sebastian. It worked.
Regards,
~Vinti
On Sat, Feb 27, 2016 at 12:44 PM, Amit Assudani <aassud...@impetus.com>
wrote:
> Your context is not being created using checkpoints, use get or create,
>
> From: Vinti Maheshwari <vinti.u...@gmail.com>
> Date: Sat
filter {x => x.data.getName.matches("sbt.*") ||
x.data.getName.matches(".*macros.*")}}
Thanks,
~Vinti
On Wed, Feb 24, 2016 at 12:55 PM, Vinti Maheshwari <vinti.u...@gmail.com>
wrote:
> Thanks much Cody, I added assembly.sbt and modified build.sbt with ivy bug
> related con
build a jar with
> all the dependencies.
>
> On Wed, Feb 24, 2016 at 1:50 PM, Vinti Maheshwari <vinti.u...@gmail.com>
> wrote:
>
>> I am not using sbt assembly currently. I need to check how to use sbt
>> assembly.
>>
>> Regards,
>> ~Vinti
>>
&g
le jar along with your code. Otherwise
> you'd have to specify each separate jar in your spark-submit line, which is
> a pain.
>
> On Wed, Feb 24, 2016 at 12:49 PM, Vinti Maheshwari <vinti.u...@gmail.com>
> wrote:
>
>> Hi Cody,
>>
>> I tried with the build file
r/kafka-exactly-once/blob/master/build.sbt
>
> includes some hacks for ivy issues that may no longer be strictly
> necessary, but try that build and see if it works for you.
>
>
> On Wed, Feb 24, 2016 at 11:14 AM, Vinti Maheshwari <vinti.u...@gmail.com>
> wrote:
>
Hello,
I have tried multiple different settings in build.sbt but seems like
nothing is working.
Can anyone suggest the right syntax/way to include kafka with spark?
Error
Exception in thread "main" java.lang.NoClassDefFoundError:
org/apache/spark/streaming/kafka/KafkaUtils$
build.sbt
Hi All
I wrote program for Spark Streaming in Scala. In my program, i passed
'remote-host' and 'remote port' under socketTextStream.
And in the remote machine, i have one perl script who is calling system
command:
echo 'data_str' | nc <>
In that way, my spark program is able to get data,
e "telnet" to
> test if the network is normal?
>
> On Mon, Feb 22, 2016 at 6:59 AM, Vinti Maheshwari <vinti.u...@gmail.com>
> wrote:
>
>> For reference, my program:
>>
>> def main(args: Array[String]): Unit = {
>> val conf = new SparkConf()
dl.So.gcda,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
On Mon, Feb 22, 2016 at 6:38 AM, Vinti Maheshwari <vinti.u...@gmail.com>
wrote:
> Hi
>
> I am in spark Streaming context, and i am reading input from the the
> socket using nc
Hi
I am in spark Streaming context, and i am reading input from the the socket
using nc -lk . When i am running it and manually giving input it's
working. But, if input is coming from different ip to this socket then
spark is not reading that input, though it's showing all the input coming
, record_4, record._5))
>>> .groupByKey((r1, r2) => (r1._1 + r2._1, r1._2 + r2._2, r1._3 +
>>> r2._3))
>>> .foreachRDD(rdd => {
>>>rdd.collect().foreach((fileName, valueTuple) => >> global map here>)
>>> })
dd.collect().foreach((fileName, valueTuple) => > map here>)
>> })
>>
>> --
>> Thanks
>> Jatin Kumar | Rocket Scientist
>> +91-7696741743 m
>>
>> On Sun, Feb 21, 2016 at 11:30 PM, Vinti Maheshwari <vinti.u...@gmail.com>
>>
Nevermind, seems like an executor level mutable map is not recommended as
stated in
http://blog.guillaume-pitel.fr/2015/06/spark-trick-shot-node-centered-aggregator/
On Sun, Feb 21, 2016 at 9:54 AM, Vinti Maheshwari <vinti.u...@gmail.com>
wrote:
> Thanks for your reply Jatin. I c
1, r2) => (r1._1 + r2._1, r1._2 + r2._2, r1._3 +
> r2._3))
>
> I hope that helps.
>
> --
> Thanks
> Jatin Kumar | Rocket Scientist
> +91-7696741743 m
>
> On Sun, Feb 21, 2016 at 10:35 PM, Vinti Maheshwari <vinti.u...@gmail.com>
> wrote:
>
>> Hello,
>
Hello,
I have input lines like below
*Input*
t1, file1, 1, 1, 1
t1, file1, 1, 2, 3
t1, file2, 2, 2, 2, 2
t2, file1, 5, 5, 5
t2, file2, 1, 1, 2, 2
and i want to achieve the output like below rows which is a vertical
addition of the corresponding numbers.
*Output*
“file1” : [ 1+1+5, 1+2+5, 1+3+5
Hi,
Sorry, please ignore my message, It was sent by mistake. I am still
drafting.
Regards,
Vinti
On Mon, Feb 1, 2016 at 2:25 PM, Vinti Maheshwari <vinti.u...@gmail.com>
wrote:
> Hi All,
>
> I recently started learning Spark. I need to use spark-streaming.
>
> 1) Inp
Hi,
I am new in spark. I wanted to do spark streaming setup to retrieve key
value pairs of below format files:
file: info1
Note: Each info file will have around of 1000 of these records. And our
system continuously generating info files. So Through spark streaming i
wanted to aggregate result.
Hi All,
I recently started learning Spark. I need to use spark-streaming.
1) Input, need to read from MongoDB
db.event_gcovs.find({executions:"56791a746e928d7b176d03c0", valid:1,
infofile:{$exists:1}, geo:"sunnyvale"}, {infofile:1}).count()
> Number of Info files: 24441
/* 0 */
{
"_id" :
31 matches
Mail list logo