using replace function

2016-12-11 Thread Hitesh Goyal
Hi team, I am using apache spark 1.6.1 I want to access data from s3 using spark SQL. For this I need to use MySQL replace function. DataFrame df=sqlcontext.sql("Select replace(column_name,'ur','xy') from table_name"); But when I try to run this it is giving error that function Replace is not

Spark 2 or Spark 1.6.x?

2016-12-11 Thread Lohith Samaga M
Hi, I am new to Spark. I would like to learn Spark. I think I should learn version 2.0.2. Or should I still go for version 1.6.x and then come to version 2.0.2? Please advise. Thanks in advance. Best regards / Mit freundlichen Grüßen / Sincères salutations M.

RE: [Spark Streaming] How to do join two messages in spark streaming(Probabaly messasges are in differnet RDD) ?

2016-12-11 Thread Sanchuan Cheng (sancheng)
smime.p7m Description: S/MIME encrypted message

Re: Monitoring the User Metrics for a long running Spark Job

2016-12-11 Thread Chawla,Sumit
Thanks a lot Sonal.. I will give it a try. Regards Sumit Chawla On Wed, Dec 7, 2016 at 10:45 PM, Sonal Goyal wrote: > You can try updating metrics.properties for the sink of your choice. In > our case, we add the following for getting application metrics in JSON >

Re: unit testing in spark

2016-12-11 Thread Juan Rodríguez Hortalá
Hi all, I would also would like to participate on that. Greetings, Juan On Fri, Dec 9, 2016 at 6:03 AM, Michael Stratton < michael.strat...@komodohealth.com> wrote: > That sounds great, please include me so I can get involved. > > On Fri, Dec 9, 2016 at 7:39 AM, Marco Mistroni

Re: Spark Streaming with Kafka

2016-12-11 Thread Oleksii Dukhno
Hi Anton, What is the command you run your spark app with? Why not working with data instead of stream on your second stage operation? Can you provide logs with the issue? ConcurrentModificationException is not a spark issue, it means that you use the same Kafka consumer instance from more than

Re: Spark Streaming with Kafka

2016-12-11 Thread Anton Okolnychyi
sorry, I forgot to mention that I was using Spark 2.0.2, Kafka 0.10, and nothing custom. I will try restate the initial question. Let's consider an example. 1. I create a stream and subscribe to a certain topic. val stream = KafkaUtils.createDirectStream(...) 2. I extract the actual data

Re: Few questions on reliability of accumulators value.

2016-12-11 Thread Sudev A C
Please help. Anyone, any thoughts on the previous mail ? Thanks Sudev On Fri, Dec 9, 2016 at 2:28 PM Sudev A C wrote: > Hi, > > Can anyone please help clarity on how accumulators can be used reliably to > measure error/success/analytical metrics ? > > Given below is use

Re: Spark Streaming with Kafka

2016-12-11 Thread Timur Shenkao
Hi, Usual general questions are: -- what is your Spark version? -- what is your Kafka version? -- do you use "standard" Kafka consumer or try to implement something custom (your own multi-threaded consumer)? The freshest docs

Spark Streaming with Kafka

2016-12-11 Thread Anton Okolnychyi
Hi, I am experimenting with Spark Streaming and Kafka. I will appreciate if someone can say whether the following assumption is correct. If I have multiple computations (each with its own output) on one stream (created as KafkaUtils.createDirectStream), then there is a chance to have

Re: Wrting data from Spark streaming to AWS Redshift?

2016-12-11 Thread kant kodali
@shyla a side question: What does Redshift can do that Spark cannot do? Trying to understand your use case. On Fri, Dec 9, 2016 at 8:47 PM, ayan guha wrote: > Ideally, saving data to external sources should not be any different. give > the write options as stated in the

Re: Random Forest hangs without trace of error

2016-12-11 Thread Marco Mistroni
OK. Did u change spark version? Java/scala/python version? Have u tried with different versions of any of the above? Hope this helps Kr On 10 Dec 2016 10:37 pm, "Morten Hornbech" wrote: > I haven’t actually experienced any non-determinism. We have nightly > integration