Re: ElasticSearch enrich

2014-06-27 Thread boci
Another question. In the foreachRDD I will initialize the JobConf, but in this place how can I get information from the items? I have an identifier in the data which identify the required ES index (so how can I set dynamic index in the foreachRDD) ? b0c1

Re: ElasticSearch enrich

2014-06-27 Thread boci
Ok I found dynamic resources, but I have a frustrating problem. This is the flow: kafka - enrich X - enrich Y - enrich Z - foreachRDD - save My problem is: if I do this it's not work, the enrich functions not called, but if I put a print it's does. for example if I do this: kafka - enrich X -

Re: ElasticSearch enrich

2014-06-27 Thread Holden Karau
So a few quick questions: 1) What cluster are you running this against? Is it just local? Have you tried local[4]? 2) When you say breakpoint, how are you setting this break point? There is a good chance your breakpoint mechanism doesn't work in a distributed environment, could you instead cause

RE: ElasticSearch enrich

2014-06-27 Thread Adrian Mocanu
: ElasticSearch enrich Wow, thanks your fast answer, it's help a lot... b0c1 -- Skype: boci13, Hangout: boci.b...@gmail.commailto:boci.b...@gmail.com On Thu, Jun 26, 2014 at 11:48

Re: ElasticSearch enrich

2014-06-27 Thread boci
This is a simply scalatest. I start a SparkConf, set the master to local (set the serializer etc), pull up kafka and es connection send a message to kafka and wait 30sec to processing. It's run in IDEA no magick trick. b0c1

Re: ElasticSearch enrich

2014-06-27 Thread Holden Karau
Try setting the master to local[4] On Fri, Jun 27, 2014 at 2:17 PM, boci boci.b...@gmail.com wrote: This is a simply scalatest. I start a SparkConf, set the master to local (set the serializer etc), pull up kafka and es connection send a message to kafka and wait 30sec to processing. It's

Re: ElasticSearch enrich

2014-06-26 Thread boci
That's okay, but hadoop has ES integration. what happened if I run saveAsHadoopFile without hadoop (or I must need to pull up hadoop programatically? (if I can)) b0c1

Re: ElasticSearch enrich

2014-06-26 Thread Nick Pentreath
You can just add elasticsearch-hadoop as a dependency to your project to user the ESInputFormat and ESOutputFormat ( https://github.com/elasticsearch/elasticsearch-hadoop). Some other basics here: http://www.elasticsearch.org/guide/en/elasticsearch/hadoop/current/spark.html For testing, yes I

Re: ElasticSearch enrich

2014-06-26 Thread boci
Thanks. I without local option I can connect with es remote, now I only have one problem. How can I use elasticsearch-hadoop with spark streaming? I mean DStream doesn't have saveAsHadoopFiles method, my second problem the output index is depend by the input data. Thanks

Re: ElasticSearch enrich

2014-06-26 Thread Holden Karau
Hi b0c1, I have an example of how to do this in the repo for my talk as well, the specific example is at https://github.com/holdenk/elasticsearchspark/blob/master/src/main/scala/com/holdenkarau/esspark/IndexTweetsLive.scala . Since DStream doesn't have a saveAsHadoopDataset we use foreachRDD and

Re: ElasticSearch enrich

2014-06-26 Thread boci
Wow, thanks your fast answer, it's help a lot... b0c1 -- Skype: boci13, Hangout: boci.b...@gmail.com On Thu, Jun 26, 2014 at 11:48 PM, Holden Karau

Re: ElasticSearch enrich

2014-06-26 Thread Holden Karau
Just your luck I happened to be working on that very talk today :) Let me know how your experiences with Elasticsearch Spark go :) On Thu, Jun 26, 2014 at 3:17 PM, boci boci.b...@gmail.com wrote: Wow, thanks your fast answer, it's help a lot... b0c1

Re: ElasticSearch enrich

2014-06-25 Thread boci
Hi guys, thanks the direction now I have some problem/question: - in local (test) mode I want to use ElasticClient.local to create es connection, but in prodution I want to use ElasticClient.remote, to this I want to pass ElasticClient to mapPartitions, or what is the best practices? - my stream

Re: ElasticSearch enrich

2014-06-25 Thread Holden Karau
On Wed, Jun 25, 2014 at 4:16 PM, boci boci.b...@gmail.com wrote: Hi guys, thanks the direction now I have some problem/question: - in local (test) mode I want to use ElasticClient.local to create es connection, but in prodution I want to use ElasticClient.remote, to this I want to pass

Re: ElasticSearch enrich

2014-06-24 Thread Peng Cheng
make sure all queries are called through class methods and wrap your query info with a class having only simple properties (strings, collections etc). If you can't find such wrapper you can also use SerializableWritable wrapper out-of-the-box, but its not recommended. (developer-api and make fat

Re: ElasticSearch enrich

2014-06-24 Thread boci
Ok but in this case where can I store the ES connection? Or all document create new ES connection inside the worker? -- Skype: boci13, Hangout: boci.b...@gmail.com On

Re: ElasticSearch enrich

2014-06-24 Thread Mayur Rustagi
Mostly ES client is not serializable for you. You can do 3 workarounds, 1. Switch to kryo serialization, register the client in kryo , might solve your serialization issue 2. Use mappartition for all your data initialize your client in the mappartition code, this will create client for each

Re: ElasticSearch enrich

2014-06-24 Thread Peng Cheng
I'm afraid persisting connection across two tasks is a dangerous act as they can't be guaranteed to be executed on the same machine. Your ES server may think its a man-in-the-middle attack! I think its possible to invoke a static method that give you a connection in a local 'pool', so nothing

Re: ElasticSearch enrich

2014-06-24 Thread Mayur Rustagi
Its not used as default serializer for some issues with compatibility requirement to register the classes.. Which part are you getting as nonserializable... you need to serialize that class if you are sending it to spark workers inside a map, reduce , mappartition or any of the operations on