Re: Re: Exception when trying to use EShadoop connector and writing rdd to ES

2015-02-10 Thread Costin Leau
What's the signature of your RDD? It looks to be a List which can't be mapped automatically to a document - you are 
probably thinking of a tuple or better yet a PairRDD.

Convert your RDD to a Pair and use that instead.

This is a guess - a gist with a simple test/code would make it easier to 
diagnose what's going on.

On 2/10/15 7:24 PM, shahid ashraf wrote:

hi costin i upgraded the es hadoop connector , and at this point i can't use 
scala, but still getting same error

On Tue, Feb 10, 2015 at 10:34 PM, Costin Leau mailto:costin.l...@gmail.com>> wrote:

Hi shahid,

I've sent the reply to the group - for some reason I replied to your 
address instead of the mailing list.
Let's continue the discussion there.

Cheers,

On 2/10/15 6:58 PM, shahid ashraf wrote:

thanks costin

i m grouping data together based on id in json and rdd contains
rdd = (1,{'SOURCES': [{n no. of key/valu}],}),(2,{'SOURCES': [{n no. of 
key/valu}],}),(3,{'SOURCES': [{n no. of
key/valu}],}),(4,{'SOURCES': [{n no. of key/valu}],})
rdd.saveAsNewAPIHadoopFile(
  path='-',
  outputFormatClass="org.__elasticsearch.hadoop.mr 
.__EsOutputFormat",
  keyClass="org.apache.hadoop.__io.NullWritable",
  
valueClass="org.elasticsearch.__hadoop.mr.LinkedMapWritable",
  conf={
  "es.nodes" : "localhost",
  "es.port" : "9200",
  "es.resource" : "shahid/hcp_id"
  })


spark-1.1.0-bin-hadoop1
java version "1.7.0_71"
elasticsearch-1.4.2
elasticsearch-hadoop-2.1.0.__Beta2.jar


On Tue, Feb 10, 2015 at 10:05 PM, Costin Leau mailto:costin.l...@gmail.com>
>__> wrote:

 Sorry but there's too little information in this email to make any 
type of assesment.
 Can you please describe what you are trying to do, what version of 
Elastic and es-spark are you suing
 and potentially post a snippet of code?
 What does your RDD contain?


 On 2/10/15 6:05 PM, shahid wrote:

 INFO scheduler.TaskSetManager: Starting task 2.1 in stage 2.0 
(TID 9,
 ip-10-80-98-118.ec2.internal, PROCESS_LOCAL, 1025 bytes)
 15/02/10 15:54:08 INFO scheduler.TaskSetManager: Lost task 1.0 
in stage 2.0
 (TID 6) on executor ip-10-80-15-145.ec2.internal:
 org.apache.spark.SparkException (Data of type 
java.util.ArrayList cannot be
 used) [duplicate 1]
 15/02/10 15:54:08 INFO scheduler.TaskSetManager: Starting task 
1.1 in stage
 2.0 (TID 10, ip-10-80-15-145.ec2.internal, PROCESS_LOCAL, 1025 
bytes)



 --
 View this message in context:

http://apache-spark-user-list.1001560.n3.nabble.com/Exception-when-trying-to-use-EShadoop-connector-and-writing-rdd-to-ES-tp21579.html




__>
 Sent from the Apache Spark User List mailing list archive at 
Nabble.com.


 --
 Costin




--
with Regards
Shahid Ashraf

--
Costin




--
with Regards
Shahid Ashraf


--
Costin

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Exception when trying to use EShadoop connector and writing rdd to ES

2015-02-10 Thread shahid ashraf
hi costin i upgraded the es hadoop connector , and at this point i can't
use scala, but still getting same error

On Tue, Feb 10, 2015 at 10:34 PM, Costin Leau  wrote:

> Hi shahid,
>
> I've sent the reply to the group - for some reason I replied to your
> address instead of the mailing list.
> Let's continue the discussion there.
>
> Cheers,
>
> On 2/10/15 6:58 PM, shahid ashraf wrote:
>
>> thanks costin
>>
>> i m grouping data together based on id in json and rdd contains
>> rdd = (1,{'SOURCES': [{n no. of key/valu}],}),(2,{'SOURCES': [{n no. of
>> key/valu}],}),(3,{'SOURCES': [{n no. of
>> key/valu}],}),(4,{'SOURCES': [{n no. of key/valu}],})
>> rdd.saveAsNewAPIHadoopFile(
>>  path='-',
>>  outputFormatClass="org.elasticsearch.hadoop.mr.
>> EsOutputFormat",
>>  keyClass="org.apache.hadoop.io.NullWritable",
>>  valueClass="org.elasticsearch.hadoop.mr.LinkedMapWritable",
>>  conf={
>>  "es.nodes" : "localhost",
>>  "es.port" : "9200",
>>  "es.resource" : "shahid/hcp_id"
>>  })
>>
>>
>> spark-1.1.0-bin-hadoop1
>> java version "1.7.0_71"
>> elasticsearch-1.4.2
>> elasticsearch-hadoop-2.1.0.Beta2.jar
>>
>>
>> On Tue, Feb 10, 2015 at 10:05 PM, Costin Leau > > wrote:
>>
>> Sorry but there's too little information in this email to make any
>> type of assesment.
>> Can you please describe what you are trying to do, what version of
>> Elastic and es-spark are you suing
>> and potentially post a snippet of code?
>> What does your RDD contain?
>>
>>
>> On 2/10/15 6:05 PM, shahid wrote:
>>
>> INFO scheduler.TaskSetManager: Starting task 2.1 in stage 2.0
>> (TID 9,
>> ip-10-80-98-118.ec2.internal, PROCESS_LOCAL, 1025 bytes)
>> 15/02/10 15:54:08 INFO scheduler.TaskSetManager: Lost task 1.0 in
>> stage 2.0
>> (TID 6) on executor ip-10-80-15-145.ec2.internal:
>> org.apache.spark.__SparkException (Data of type
>> java.util.ArrayList cannot be
>> used) [duplicate 1]
>> 15/02/10 15:54:08 INFO scheduler.TaskSetManager: Starting task
>> 1.1 in stage
>> 2.0 (TID 10, ip-10-80-15-145.ec2.internal, PROCESS_LOCAL, 1025
>> bytes)
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-spark-user-list.__1001560.n3.nabble.com/__
>> Exception-when-trying-to-use-__EShadoop-connector-and-__
>> writing-rdd-to-ES-tp21579.html
>> > Exception-when-trying-to-use-EShadoop-connector-and-
>> writing-rdd-to-ES-tp21579.html>
>> Sent from the Apache Spark User List mailing list archive at
>> Nabble.com.
>>
>>
>> --
>> Costin
>>
>>
>>
>>
>> --
>> with Regards
>> Shahid Ashraf
>>
> --
> Costin
>



-- 
with Regards
Shahid Ashraf


Re: Exception when trying to use EShadoop connector and writing rdd to ES

2015-02-10 Thread Costin Leau

First off, I'd recommend using the latest es-hadoop beta (2.1.0.Beta3) or even 
better, the dev build [1].
Second, using the native Java/Scala API [2] since the configuration and 
performance are both easier.
Third, when you are using JSON input, tell es-hadoop/spark that. the connector 
can work with both objects (the default) or
raw json.

It so just happens, the es-hadoop connector describes the above here [3] :).

Hope this helps,

[1] 
http://www.elasticsearch.org/guide/en/elasticsearch/hadoop/master/install.html#download-dev
[2] 
http://www.elasticsearch.org/guide/en/elasticsearch/hadoop/master/spark.html#spark-native
[3] 
http://www.elasticsearch.org/guide/en/elasticsearch/hadoop/master/spark.html#spark-write-json

On 2/10/15 6:58 PM, shahid ashraf wrote:

thanks costin

i m grouping data together based on id in json and rdd contains
rdd = (1,{'SOURCES': [{n no. of key/valu}],}),(2,{'SOURCES': [{n no. of 
key/valu}],}),(3,{'SOURCES': [{n no. of
key/valu}],}),(4,{'SOURCES': [{n no. of key/valu}],})
rdd.saveAsNewAPIHadoopFile(
 path='-',
 outputFormatClass="org.elasticsearch.hadoop.mr.EsOutputFormat",
 keyClass="org.apache.hadoop.io.NullWritable",
 valueClass="org.elasticsearch.hadoop.mr.LinkedMapWritable",
 conf={
 "es.nodes" : "localhost",
 "es.port" : "9200",
 "es.resource" : "shahid/hcp_id"
 })


spark-1.1.0-bin-hadoop1
java version "1.7.0_71"
elasticsearch-1.4.2
elasticsearch-hadoop-2.1.0.Beta2.jar


On Tue, Feb 10, 2015 at 10:05 PM, Costin Leau mailto:costin.l...@gmail.com>> wrote:

Sorry but there's too little information in this email to make any type of 
assesment.
Can you please describe what you are trying to do, what version of Elastic 
and es-spark are you suing
and potentially post a snippet of code?
What does your RDD contain?


On 2/10/15 6:05 PM, shahid wrote:

INFO scheduler.TaskSetManager: Starting task 2.1 in stage 2.0 (TID 9,
ip-10-80-98-118.ec2.internal, PROCESS_LOCAL, 1025 bytes)
15/02/10 15:54:08 INFO scheduler.TaskSetManager: Lost task 1.0 in stage 
2.0
(TID 6) on executor ip-10-80-15-145.ec2.internal:
org.apache.spark.__SparkException (Data of type java.util.ArrayList 
cannot be
used) [duplicate 1]
15/02/10 15:54:08 INFO scheduler.TaskSetManager: Starting task 1.1 in 
stage
2.0 (TID 10, ip-10-80-15-145.ec2.internal, PROCESS_LOCAL, 1025 bytes)



--
View this message in context:

http://apache-spark-user-list.__1001560.n3.nabble.com/__Exception-when-trying-to-use-__EShadoop-connector-and-__writing-rdd-to-ES-tp21579.html


Sent from the Apache Spark User List mailing list archive at Nabble.com.


--
Costin




--
with Regards
Shahid Ashraf


--
Costin

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org