Re: Loading JSON to ElasticSearch

2014-01-30 Thread David Pilato
Logstash uses native API when you choose elasticsearch output: 
http://logstash.net/docs/1.3.3/outputs/elasticsearch
About machines separation, I would say that you should test it. If your nodes 
are not really intensively used (CPU / IO), you can probably use the same 
machine for extracting content and produce JSON docs.

 HTH

-- 
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet | @elasticsearchfr


Le 28 janvier 2014 at 20:14:07, ZenMaster80 (sabdall...@gmail.com) a écrit:

Thanks David, I will certainly look into logstash. Do you think it is a good 
idea to separate data analysis and indexing into 2 different machines since 
both require lots of cpu time. 
If I use logstash to send files over to ES, will I be able to use native Java 
API or http, and is there any preference to the API? I have noticed there are 
somethings that aren't very easy and may be don't even work in the native API? 
Thanks again

On Tuesday, January 28, 2014 1:05:32 PM UTC-5, David Pilato wrote:
Did you try https://github.com/dadoonet/fsriver?
Never tested it with so many docs but may be it could help you here?

If you have already generated json files on a server, then I would recommend 
trying logstash to send them into elasticsearch. 

My 2 cents

-- 
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet | @elasticsearchfr


Le 28 janvier 2014 at 16:46:06, ZenMaster80 (sabda...@gmail.com) a écrit:

I would like to get your perspective on how to load json to index server in my 
scenario.
We have about 15 million documents in html/pdf/... on Server 1
I would like to process the data and convert to json on server 2
I would like the indexer to index json n a separate machine/server server 3

Ideally I thought on Server 2, as I prepare json and have it ready in memory, I 
can feed it to indexer. But since data processing is cpu intensive, I want 
indexing to be done on a separate machines/server.
How do you guys deal with this since I can no longer feed in-memory json to the 
indexer on separate machine? Do I just grab files from server 2 and index them 
then?
--
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearc...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/05b977ac-00d0-45c0-9e58-8df523e6978c%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
--
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/f536d58c-89ab-4609-b5ca-cef44e2b879a%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/etPan.52ea06f7.41b71efb.45fa%40MacBook-Air-de-David.local.
For more options, visit https://groups.google.com/groups/opt_out.


Loading JSON to ElasticSearch

2014-01-28 Thread ZenMaster80
I would like to get your perspective on how to load json to index server in 
my scenario.
We have about 15 million documents in html/pdf/... on Server 1
I would like to process the data and convert to json on server 2
I would like the indexer to index json n a separate machine/server server 3

Ideally I thought on Server 2, as I prepare json and have it ready in 
memory, I can feed it to indexer. But since data processing is cpu 
intensive, I want indexing to be done on a separate machines/server.
How do you guys deal with this since I can no longer feed in-memory json to 
the indexer on separate machine? Do I just grab files from server 2 and 
index them then?

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/05b977ac-00d0-45c0-9e58-8df523e6978c%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: Loading JSON to ElasticSearch

2014-01-28 Thread ZenMaster80
Thanks David, I will certainly look into hashtag. Do you think it is a good 
idea to separate data analysis and indexing into 2 different machines since 
both require lots of cpu time. 
If I use hashtag to send files over to ES, will I be able to use native 
Java API or http, and is there any preference to the API? I have noticed 
there are somethings that aren't very easy and may be don't even work in 
the native API? 
Thanks again.

On Tuesday, January 28, 2014 1:05:32 PM UTC-5, David Pilato wrote:

 Did you try https://github.com/dadoonet/fsriver?
 Never tested it with so many docs but may be it could help you here?

 If you have already generated json files on a server, then I would 
 recommend trying logstash to send them into elasticsearch. 

 My 2 cents

 -- 
 *David Pilato* | *Technical Advocate* | *Elasticsearch.com*
 @dadoonet https://twitter.com/dadoonet | 
 @elasticsearchfrhttps://twitter.com/elasticsearchfr


 Le 28 janvier 2014 at 16:46:06, ZenMaster80 (sabda...@gmail.comjavascript:) 
 a écrit:

 I would like to get your perspective on how to load json to index server 
 in my scenario. 
 We have about 15 million documents in html/pdf/... on Server 1
 I would like to process the data and convert to json on server 2
 I would like the indexer to index json n a separate machine/server server 3

 Ideally I thought on Server 2, as I prepare json and have it ready in 
 memory, I can feed it to indexer. But since data processing is cpu 
 intensive, I want indexing to be done on a separate machines/server.
 How do you guys deal with this since I can no longer feed in-memory json 
 to the indexer on separate machine? Do I just grab files from server 2 and 
 index them then?
  --
 You received this message because you are subscribed to the Google Groups 
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an 
 email to elasticsearc...@googlegroups.com javascript:.
 To view this discussion on the web visit 
 https://groups.google.com/d/msgid/elasticsearch/05b977ac-00d0-45c0-9e58-8df523e6978c%40googlegroups.com
 .
 For more options, visit https://groups.google.com/groups/opt_out.



-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/a02427ec-a3d8-484f-9cfb-2ba7628192b1%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: Loading JSON to ElasticSearch

2014-01-28 Thread ZenMaster80
Thanks David, I will certainly look into logstash. Do you think it is a 
good idea to separate data analysis and indexing into 2 different machines 
since both require lots of cpu time. 
If I use logstash to send files over to ES, will I be able to use native 
Java API or http, and is there any preference to the API? I have noticed 
there are somethings that aren't very easy and may be don't even work in 
the native API? 
Thanks again

On Tuesday, January 28, 2014 1:05:32 PM UTC-5, David Pilato wrote:

 Did you try https://github.com/dadoonet/fsriver?
 Never tested it with so many docs but may be it could help you here?

 If you have already generated json files on a server, then I would 
 recommend trying logstash to send them into elasticsearch. 

 My 2 cents

 -- 
 *David Pilato* | *Technical Advocate* | *Elasticsearch.com*
 @dadoonet https://twitter.com/dadoonet | 
 @elasticsearchfrhttps://twitter.com/elasticsearchfr


 Le 28 janvier 2014 at 16:46:06, ZenMaster80 (sabda...@gmail.comjavascript:) 
 a écrit:

 I would like to get your perspective on how to load json to index server 
 in my scenario. 
 We have about 15 million documents in html/pdf/... on Server 1
 I would like to process the data and convert to json on server 2
 I would like the indexer to index json n a separate machine/server server 3

 Ideally I thought on Server 2, as I prepare json and have it ready in 
 memory, I can feed it to indexer. But since data processing is cpu 
 intensive, I want indexing to be done on a separate machines/server.
 How do you guys deal with this since I can no longer feed in-memory json 
 to the indexer on separate machine? Do I just grab files from server 2 and 
 index them then?
  --
 You received this message because you are subscribed to the Google Groups 
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an 
 email to elasticsearc...@googlegroups.com javascript:.
 To view this discussion on the web visit 
 https://groups.google.com/d/msgid/elasticsearch/05b977ac-00d0-45c0-9e58-8df523e6978c%40googlegroups.com
 .
 For more options, visit https://groups.google.com/groups/opt_out.



-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/f536d58c-89ab-4609-b5ca-cef44e2b879a%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.