Re: [hadoop] Performance in using Text vs. MapWritable

2014-03-14 Thread Brian Stempin
It does, thanks.

Brian


On Fri, Mar 14, 2014 at 11:29 AM, Costin Leau  wrote:

> Hey,
>
> There is but in the big picture it doesn't make any difference. If the
> data is already in JSON format then es-hadoop can stream the data directly
> without having to do any conversion. With a data (Map)
> the map has to be converted into JSON - note that this process is quite
> efficient and uses the same amount of memory no matter the number of
> documents/maps.
> Consider Hadoop batch nature I would not worry about choosing one over the
> other but rather focus on ease of use.
>
> If the data is in JSON or you want ultimate control over what data is sent
> to Elasticsearch, then JSON is the way to go - the data is streamed as is.
> If you don't use JSON and have data in various formats readable through
> Hadoop, then pick the Map - it gives you maximum
> interoperability and you don't have to worry about transforming data into
> an intermediate format.
>
> Hope this helps,
>
>
> On 3/14/2014 4:46 PM, Brian Stempin wrote:
>
>> Hi,
>> I'm currently using the elasticsearch-hadoop component to load data into
>> my ES cluster.  Currently, the ESOutputFormat
>> will accept a Map or a Text that is already in JSON
>> format.  My question:  Is there a performance
>> advantage to using one over the other?
>>
>> Thanks,
>> Brian
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to
>> elasticsearch+unsubscr...@googlegroups.com > unsubscr...@googlegroups.com>.
>>
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/elasticsearch/20302cc7-
>> 799f-4723-89db-3b050123d2bd%40googlegroups.com
>> > 799f-4723-89db-3b050123d2bd%40googlegroups.com?utm_medium=
>> email&utm_source=footer>.
>>
>> For more options, visit https://groups.google.com/d/optout.
>>
>
> --
> Costin
>
>
> --
> You received this message because you are subscribed to a topic in the
> Google Groups "elasticsearch" group.
> To unsubscribe from this topic, visit https://groups.google.com/d/
> topic/elasticsearch/hs-LJ6Le2AQ/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to
> elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/
> msgid/elasticsearch/53232046.4080206%40gmail.com.
>
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CANB1ciC56FppVsL6tAha-oad%2BDGMP7cJMdZLPU1-RkRUN1qtkg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: [hadoop] Performance in using Text vs. MapWritable

2014-03-14 Thread Costin Leau

Hey,

There is but in the big picture it doesn't make any difference. If the data is already in JSON format then es-hadoop can 
stream the data directly without having to do any conversion. With a data (Map) the map has to be 
converted into JSON - note that this process is quite efficient and uses the same amount of memory no matter the number 
of documents/maps.

Consider Hadoop batch nature I would not worry about choosing one over the 
other but rather focus on ease of use.

If the data is in JSON or you want ultimate control over what data is sent to Elasticsearch, then JSON is the way to go 
- the data is streamed as is.
If you don't use JSON and have data in various formats readable through Hadoop, then pick the Map - 
it gives you maximum interoperability and you don't have to worry about transforming data into an intermediate format.


Hope this helps,

On 3/14/2014 4:46 PM, Brian Stempin wrote:

Hi,
I'm currently using the elasticsearch-hadoop component to load data into my ES 
cluster.  Currently, the ESOutputFormat
will accept a Map or a Text that is already in JSON format. 
 My question:  Is there a performance
advantage to using one over the other?

Thanks,
Brian

--
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to
elasticsearch+unsubscr...@googlegroups.com 
.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/20302cc7-799f-4723-89db-3b050123d2bd%40googlegroups.com
.
For more options, visit https://groups.google.com/d/optout.


--
Costin

--
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/53232046.4080206%40gmail.com.
For more options, visit https://groups.google.com/d/optout.


[hadoop] Performance in using Text vs. MapWritable

2014-03-14 Thread Brian Stempin
Hi,
I'm currently using the elasticsearch-hadoop component to load data into my 
ES cluster.  Currently, the ESOutputFormat will accept a Map or a Text that is already in JSON format.  My question:  Is there 
a performance advantage to using one over the other?

Thanks,
Brian

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/20302cc7-799f-4723-89db-3b050123d2bd%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.