Re: EC2 Discovery

2014-03-21 Thread ZenMaster80
I am not sure if I missed something, but what you mentioned I believe I 
already tried as showing in my original post.
I can connect to each machine individually and I am able ti index and query 
it fine with default configuration without any zen or ec2 settings. But, 
when I turned them on like I show on the post, I get this  Request failed 
to get to the server (status code: 0): when trying to query the instance.
Did you mean I should try to see if I can access one instance from the 
other? This I didn't try yet.

On Friday, March 21, 2014 4:46:40 AM UTC-4, Norberto Meijome wrote:

 Don't try ec2 discovery until you have tested that:
 - you can connect from one machine to another on port 9300 ( nc as client 
 and server, basic networking/ firewalling)
 - run a simple aws ec2 describe instances call with the API key you plan 
 to use, and you can see the machines you need there. Bonus points for 
 filtering based on the rules you intense to use ( sec group, tags). This is 
 to ensure your API keys have the correct access needed.

 Once you have those basic steps working, use them on es config.

 Make sure you enable ec2 discovery and disable the zen discovery ( it will 
 run first and likely time out and ec2 disco won't get to exec). 

 The other thing to watch out for is contacting nodes which are too busy to 
 ack your new nodes request for cluster info...but that would be a problem 
 with zen disco too.
 On 21/03/2014 12:31 PM, Raphael Miranda raphael...@gmail.comjavascript: 
 wrote:

 are both machines in the same security group?

 --
 You received this message because you are subscribed to the Google Groups 
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an 
 email to elasticsearc...@googlegroups.com javascript:.
 To view this discussion on the web visit 
 https://groups.google.com/d/msgid/elasticsearch/eb8bb939-3b9d-4f5b-a45c-3d529f75983e%40googlegroups.com
 .
 For more options, visit https://groups.google.com/d/optout.



-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/7c9e9da8-6efe-4005-8a69-c00daa6ec711%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: EC2 Discovery

2014-03-21 Thread ZenMaster80
I am not sure if I missed something, but what you mentioned I believe I 
already tried as showing in my original post.
I can connect from one instance to another.
I can connect to each machine individually and I am able to index and query 
it fine with default configuration without any zen or ec2 settings. But, 
when I turned them on like I show on the post, I get this  Request failed 
to get to the server (status code: 0): when trying to query the instance, 
and when I do this, it won't even log anything, it is not getting that far.


On Friday, March 21, 2014 4:46:40 AM UTC-4, Norberto Meijome wrote:

 Don't try ec2 discovery until you have tested that:
 - you can connect from one machine to another on port 9300 ( nc as client 
 and server, basic networking/ firewalling)
 - run a simple aws ec2 describe instances call with the API key you plan 
 to use, and you can see the machines you need there. Bonus points for 
 filtering based on the rules you intense to use ( sec group, tags). This is 
 to ensure your API keys have the correct access needed.

 Once you have those basic steps working, use them on es config.

 Make sure you enable ec2 discovery and disable the zen discovery ( it will 
 run first and likely time out and ec2 disco won't get to exec). 

 The other thing to watch out for is contacting nodes which are too busy to 
 ack your new nodes request for cluster info...but that would be a problem 
 with zen disco too.
 On 21/03/2014 12:31 PM, Raphael Miranda raphael...@gmail.comjavascript: 
 wrote:

 are both machines in the same security group?

 --
 You received this message because you are subscribed to the Google Groups 
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an 
 email to elasticsearc...@googlegroups.com javascript:.
 To view this discussion on the web visit 
 https://groups.google.com/d/msgid/elasticsearch/eb8bb939-3b9d-4f5b-a45c-3d529f75983e%40googlegroups.com
 .
 For more options, visit https://groups.google.com/d/optout.



-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/48571118-fd84-45da-9aaf-0314c936336b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


EC2 Discovery

2014-03-20 Thread ZenMaster80
Any clues to what i am missing, i turned discovery trace on, but dont't see any 
useful info.

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/7f3dce1e-53d1-4c38-804f-6262896d43d6%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Bulk Processor

2014-03-14 Thread ZenMaster80
David,

Sorry, I didn't quite follow, does it do the flushing automatically or am I 
supposed to tell it?

On Wednesday, March 12, 2014 4:05:49 PM UTC-4, David Pilato wrote:

 It also flush docs after a given time, let's say every 5 seconds.
 BTW there is a small issue which basically flush the Bulk every n-1 docs 
 instead of n.

 Fix is on the way.

 --
 David ;-)
 Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs


 Le 12 mars 2014 à 20:51, ZenMaster80 sabda...@gmail.com javascript: a 
 écrit :


 I don't quite undertsand what the bulk processor is doing this, I would 
 like someone to explain how it is upposed to work to make sure I designed 
 this correctly.
 I specify the number of actions 1000.
 my feeder keeos pushing documents to it Its more like a loop iterating 
 documents folders, and I push eash document to the bulk. I expected the 
 bulk to queue things until it reaches 1000 docs, then processes the bulk?

 Yet, this is how it logs, thie comes from the call back functions of the 
 bulk processor.


 Bulk Called: ID= 1, Actions=33, MB=5.46250
 Bulk Called: ID= 2, Actions=29, MB=5.51660
 Bulk Succeeded: ID= 1, took= 921 ms
 Bulk Called: ID= 3, Actions=12, MB=5.691812
 Bulk Succeeded: ID= 2, took= 1526 ms

 .



 Bulk Called: ID= 23, Actions=8, MB=5.45294
 Bulk Succeeded: ID= 23, took= 751 ms
 Bulk Called: ID= 24, Actions=19, MB=5.383918
 Bulk Succeeded: ID= 24, took= 331 ms
 Bulk Called: ID= 25, Actions=22, MB=5.347542
 Bulk Succeeded: ID= 25, took= 694 ms
 Bulk Called: ID= 26, Actions=58, MB=5.249195
 Bulk Succeeded: ID= 26, took= 583 ms
 Bulk Called: ID= 27, Actions=89, MB=5.244396
 Bulk Succeeded: ID= 27, took= 588 ms.


 Bulk Called: ID= 47, Actions=17, MB=5.245771 ...


 Bulk Succeeded: ID= 47, took= 431 ms

 Finished Processing the whole thing




  -- 
 You received this message because you are subscribed to the Google Groups 
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an 
 email to elasticsearc...@googlegroups.com javascript:.
 To view this discussion on the web visit 
 https://groups.google.com/d/msgid/elasticsearch/3f06e4bc-eb79-4dd8-b987-1bf86c028062%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/3f06e4bc-eb79-4dd8-b987-1bf86c028062%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/9cb96ece-d30d-49a2-bcb4-bb09098094fc%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Occational client.transport.NoNodeAvailableException

2014-03-14 Thread ZenMaster80
I will post logs in a bit. I plan to wun on EC2, but currently just running 
on a local machine i7, 4G Ram.

I had  int concurrentRequests =  Runtime.getRuntime().availableProcessors(); 
(Returns 8), 
If I change this value to just 1, I don't get the exception, but indexing 
performance slows down considerably. I am not sure if 8 requests is really 
overwhelming the node.

On Friday, March 14, 2014 3:58:21 PM UTC-4, Binh Ly wrote:

 I'm curious, is there anything else in the es log files? Also are you 
 running on EC2 micro instances?


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/1325173c-230d-416a-8a25-3b2201fa987a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Mapping Attachment plugin Installtion/dubian

2014-03-13 Thread ZenMaster80
I am having trouble finding how to install the above plugin? I installed 
Elastic Search with Dubian.
Typically On my local linux machine I did /bin/plugin , I am not sure 
where is the 'bin/plugin goes with the dubian installation?

Thanks

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/685b87e0-34bd-49da-993a-5a92927cc0f1%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [Ann] Elasticsearch Image Plugin 1.1.0 released

2014-03-13 Thread ZenMaster80
Great, I am interested in trying this.

On Thursday, March 13, 2014 7:09:38 AM UTC-4, Kevin Wang wrote:

 Hi All,

 I've released version 1.1.0 of Elasticsearch Image Plugin.
 The Image Plugin is an Content Based Image Retrieval Plugin for 
 Elasticsearch using LIRE (Lucene Image Retrieval). It allows users to index 
 images and search for similar images.

 Changes in 1.1.0:

- Added limit in image query
- Added plugin version in es-plugin.properties


 https://github.com/kzwang/elasticsearch-image

 Also I've created a demo website for this plugin (
 http://demo.elasticsearch-image.com/), it has 1,000,000 images (well, 
 haven't finish index all images yet, but it should be able to demo this 
 plugin) from MIRFLICKR-1M collection (http://press.liacs.nl/mirflickr)


 Thanks,
 Kevin


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/5003a60a-4013-4273-87ef-b30a298d78d4%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


How to install Mapping attachment Plugin with debian install

2014-03-13 Thread ZenMaster80
On my local machine, I do this: bin/plugin -install ...

With debian installation, I am not sure where the bin/plugin' folder is?
Anyone knows?

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/14d64f1f-fb5d-4c7c-876c-726814229d26%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: How to install Mapping attachment Plugin with debian install

2014-03-13 Thread ZenMaster80
Thanks - I figured it out as soon as I posted.
I found this explained the directory structure well.

https://gist.github.com/mystix/5460660

On Thursday, March 13, 2014 1:48:07 PM UTC-4, David Pilato wrote:

 It should be in /usr/share/elasticsearch/bin/



 -- 
 *David Pilato* | *Technical Advocate* | *Elasticsearch.com*
 @dadoonet https://twitter.com/dadoonet | 
 @elasticsearchfrhttps://twitter.com/elasticsearchfr


 Le 13 mars 2014 à 17:19:49, ZenMaster80 (sabda...@gmail.com javascript:) 
 a écrit:

 On my local machine, I do this: bin/plugin -install ... 

  With debian installation, I am not sure where the bin/plugin' folder is?
  Anyone knows?
  --
 You received this message because you are subscribed to the Google Groups 
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an 
 email to elasticsearc...@googlegroups.com javascript:.
 To view this discussion on the web visit 
 https://groups.google.com/d/msgid/elasticsearch/14d64f1f-fb5d-4c7c-876c-726814229d26%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/14d64f1f-fb5d-4c7c-876c-726814229d26%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.



-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/e1068cb3-a877-42ca-b0b0-d9d503f7cb53%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Bulk Processor

2014-03-12 Thread ZenMaster80

I don't quite undertsand what the bulk processor is doing this, I would 
like someone to explain how it is upposed to work to make sure I designed 
this correctly.
I specify the number of actions 1000.
my feeder keeos pushing documents to it Its more like a loop iterating 
documents folders, and I push eash document to the bulk. I expected the 
bulk to queue things until it reaches 1000 docs, then processes the bulk?

Yet, this is how it logs, thie comes from the call back functions of the 
bulk processor.


Bulk Called: ID= 1, Actions=33, MB=5.46250
Bulk Called: ID= 2, Actions=29, MB=5.51660
Bulk Succeeded: ID= 1, took= 921 ms
Bulk Called: ID= 3, Actions=12, MB=5.691812
Bulk Succeeded: ID= 2, took= 1526 ms

.



Bulk Called: ID= 23, Actions=8, MB=5.45294
Bulk Succeeded: ID= 23, took= 751 ms
Bulk Called: ID= 24, Actions=19, MB=5.383918
Bulk Succeeded: ID= 24, took= 331 ms
Bulk Called: ID= 25, Actions=22, MB=5.347542
Bulk Succeeded: ID= 25, took= 694 ms
Bulk Called: ID= 26, Actions=58, MB=5.249195
Bulk Succeeded: ID= 26, took= 583 ms
Bulk Called: ID= 27, Actions=89, MB=5.244396
Bulk Succeeded: ID= 27, took= 588 ms.


Bulk Called: ID= 47, Actions=17, MB=5.245771 ...


Bulk Succeeded: ID= 47, took= 431 ms

Finished Processing the whole thing




-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/3f06e4bc-eb79-4dd8-b987-1bf86c028062%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Bulk Processor question

2014-03-12 Thread ZenMaster80

I don't quite undertsand what the bulk processor is doing, I would like 
someone to explain how it is supposed to work to make sure I designed this 
correctly.
I specify the number of actions 1000.
My feeder keeps pushing documents to it Its more like a loop iterating 
documents folders where I push each document to the bulk. I expected the 
bulk to queue things until it reaches 1000 docs? Then process the bulk?

Yet, this is how it logs, this comes from the call back functions of the 
bulk processor.


Bulk Called: ID= 1, Actions=33, MB=5.46250
Bulk Called: ID= 2, Actions=29, MB=5.51660
Bulk Succeeded: ID= 1, took= 921 ms
Bulk Called: ID= 3, Actions=12, MB=5.691812
Bulk Succeeded: ID= 2, took= 1526 ms

.



Bulk Called: ID= 23, Actions=8, MB=5.45294
Bulk Succeeded: ID= 23, took= 751 ms
Bulk Called: ID= 24, Actions=19, MB=5.383918
Bulk Succeeded: ID= 24, took= 331 ms
Bulk Called: ID= 25, Actions=22, MB=5.347542
Bulk Succeeded: ID= 25, took= 694 ms
Bulk Called: ID= 26, Actions=58, MB=5.249195
Bulk Succeeded: ID= 26, took= 583 ms
Bulk Called: ID= 27, Actions=89, MB=5.244396
Bulk Succeeded: ID= 27, took= 588 ms.


Bulk Called: ID= 47, Actions=17, MB=5.245771 ...


Bulk Succeeded: ID= 47, took= 431 ms

Finished Processing the whole thing


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/a3131fe6-5183-495f-8658-21b276a72eb6%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Bulk Processor question

2014-03-12 Thread ZenMaster80
My docs vary in size. some a very small, some are pdfs like showing in the 
log there, how do you suggest I do this since I don't know when the docs 
will be small or large?

On Wednesday, March 12, 2014 4:01:53 PM UTC-4, Jörg Prante wrote:

 BulkProcessor has two thresholds, the number of actions (as you use by 
 setting it to 1000) or a bulk request byte volume (default 5M). What you 
 see is the 5M limit kicking in, your docs are quite large.

 Jörg


 On Wed, Mar 12, 2014 at 8:54 PM, ZenMaster80 sabda...@gmail.comjavascript:
  wrote:


 I don't quite undertsand what the bulk processor is doing, I would like 
 someone to explain how it is supposed to work to make sure I designed this 
 correctly.
 I specify the number of actions 1000.
 My feeder keeps pushing documents to it Its more like a loop iterating 
 documents folders where I push each document to the bulk. I expected the 
 bulk to queue things until it reaches 1000 docs? Then process the bulk?

 Yet, this is how it logs, this comes from the call back functions of the 
 bulk processor.


 Bulk Called: ID= 1, Actions=33, MB=5.46250
 Bulk Called: ID= 2, Actions=29, MB=5.51660
 Bulk Succeeded: ID= 1, took= 921 ms
 Bulk Called: ID= 3, Actions=12, MB=5.691812
 Bulk Succeeded: ID= 2, took= 1526 ms

 .



 Bulk Called: ID= 23, Actions=8, MB=5.45294
 Bulk Succeeded: ID= 23, took= 751 ms
 Bulk Called: ID= 24, Actions=19, MB=5.383918
 Bulk Succeeded: ID= 24, took= 331 ms
 Bulk Called: ID= 25, Actions=22, MB=5.347542
 Bulk Succeeded: ID= 25, took= 694 ms
 Bulk Called: ID= 26, Actions=58, MB=5.249195
 Bulk Succeeded: ID= 26, took= 583 ms
 Bulk Called: ID= 27, Actions=89, MB=5.244396
 Bulk Succeeded: ID= 27, took= 588 ms.


 Bulk Called: ID= 47, Actions=17, MB=5.245771 ...


 Bulk Succeeded: ID= 47, took= 431 ms

 Finished Processing the whole thing


  -- 
 You received this message because you are subscribed to the Google Groups 
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an 
 email to elasticsearc...@googlegroups.com javascript:.
 To view this discussion on the web visit 
 https://groups.google.com/d/msgid/elasticsearch/a3131fe6-5183-495f-8658-21b276a72eb6%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/a3131fe6-5183-495f-8658-21b276a72eb6%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.




-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/23cffa8e-ba3f-49f7-9b23-5b4fdd47b054%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


BulkProcessor

2014-03-07 Thread ZenMaster80
if I set Bulk size number of files at 5000, I feed it 5000, 5000, 5000, 
what happens if the #of files for instance in the last batch is 2000. How 
does it know that it needs to process the last 2000 ?

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/8c322041-783d-45fb-9595-38aa6a50d0eb%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: indexing binary

2014-02-27 Thread ZenMaster80
Binh, Thanks, With your help I think I am closer to the answer. Wih the 
sample mapping you provided, I should be able to provide the base 64 
contents of the image file as the contents field, and the ocrtext as 
text field. So, when the ocr text is searched, i can return the content 
which is the image. With the above mapping I believe the image is saved in 
the _source as well as the field for highlighting  purposes, Can I 
prevent it from being stored in _source by something like this?

startObject(_source).field(enabled,no).endObject()

On Thursday, February 27, 2014 8:29:25 AM UTC-5, Binh Ly wrote:

 You certainly can add a new field, and then just put the OCR text into 
 that new field. So for example:

 Mapping:

 PutMappingResponse putMappingResponse = new 
 PutMappingRequestBuilder(
 
 client.admin().indices()).setIndices(INDEX_NAME).setType(DOCUMENT_TYPE).setSource(
 XContentFactory.jsonBuilder().startObject()
 .field(DOCUMENT_TYPE).startObject()
 .field(properties).startObject()
 .field(text).startObject()
 .field(type, string)
 .endObject()
 .field(file).startObject()
 .field(store, yes)
 .field(type, attachment)
 .field(fields).startObject()
 .field(file).startObject()
 .field(store, yes)
 .endObject()
 .endObject()
 .endObject()
 .endObject()
 .endObject()
 .endObject()
 ).execute().actionGet();

 Then put the OCR text into the text field:

 IndexResponse indexResponse = client.prepareIndex(INDEX_NAME, 
 DOCUMENT_TYPE, 1)
 .setSource(XContentFactory.jsonBuilder().startObject()
 .field(text, ocrText)
 .field(file).startObject()
 .field(content, fileContents)
 .field(_indexed_chars, -1)
 .endObject()
 .endObject()
 ).execute().actionGet();

 You probably don't need to index the image binary information - not sure 
 what you would need it for.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/a7db1379-5161-4f7d-ab78-a683c8beb07d%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: indexing binary

2014-02-27 Thread ZenMaster80
Sorry for the confusion - I do want PDFs, but I am concerned with the 
retrieval of the image file when it ocr text is searched. I must be missing 
something.
As showing below, I provide two fields text and the content. In your 
second post you say I don't need the content' field for images? So, how 
does the search return the image to the asking client Web app for 
instance when a text match occurs with the image ocr text? If I only 
include text, then it will return the text part of the image only and not 
the image, correct?

source(XContentFactory.jsonBuilder()

 .startObject()

  .field(text,ocrText)//extracted ocr 
text from image

   .field( file).startObject()

 .field(content, fileContents) 
 //content is the encoded base64string of the image file? is it needed?

 .field(_indexed_chars, -1)

   .endObject()

 .endObject()



On Thursday, February 27, 2014 1:16:36 PM UTC-5, Binh Ly wrote:

 Oh, the attachment part is for your PDF. If you don't need to index PDFs 
 then just remove that part:

 PutMappingResponse putMappingResponse = new 
 PutMappingRequestBuilder(
 client.admin().indices()).
 setIndices(INDEX_NAME).setType(DOCUMENT_TYPE).setSource(
 XContentFactory.jsonBuilder().startObject()
 .field(DOCUMENT_TYPE).startObject()
 .field(properties).startObject()
 .field(text).startObject()
 .field(type, string)
 .endObject()
 .endObject()
 .endObject()
 .endObject()
 ).execute().actionGet();

 Indexing:

 IndexResponse indexResponse = client.prepareIndex(INDEX_
 NAME, DOCUMENT_TYPE, 1)
 .setSource(XContentFactory.jsonBuilder().startObject()
 .field(text, ocrText)
 .endObject()
 ).execute().actionGet();



-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/35b9a36f-0a4e-4973-8c03-8d35f0af1a9f%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


indexing binary

2014-02-26 Thread ZenMaster80
I index PDFs using apache with the following mapping.


.field( type, attachment )

.field(fields)

.startObject()

.startObject(file)

.field(store, yes)

.endObject()

I want to index photos, I am able to extract text using OCR. I am confused 
how to index the text though, do I treat it like any document and not as an 
attachment? I have text as String when extracted and not base 64 like in 
the case of pdfs?
I am confused to how it gets stored and how does it work if I need to make 
it available during search? Can someone explain on how I do this?

XContentFactory.jsonBuilder().startObject()

   .startObject(INDEX_TYPE) 

   .startObject(_source).field(enabled,no).endObject()  
//This 
line will not store/not store the base 64 whole _source

 .startObject(properties)



So, My photo object becomes something like this, what about the source (the 
image itself ?)
jsonObject
{
  content:text extracted from image
  name:my_photo.png
}


//add to the bulk indexer for indexing

bulkProcessor.add(Requests.indexRequest(INDEX_NAME).type(INDEX_TYPE).id(
jsonObject.getString(name)).source(jsonObject.toString()));

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/2012d7c6-b499-4318-8ae7-512879e5e8b8%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: TransportSerializationException: Failed to deserialize exception response from stream

2014-02-20 Thread ZenMaster80
I ran into same problem, version was correct, plugins installed, In my case 
port 9300 was not opened for transportclient, once I opened it, it worked 
fine.

On Thursday, February 20, 2014 9:06:42 AM UTC-5, Tiago Rodrigues wrote:

 I get this error sometimes when I try to create an index. 

 My version of java in easticsearch is the same as the client server. 

 Not is always this error occurs, different than seen in other posts.

 The log:

 Exception in thread main 
 org.elasticsearch.transport.TransportSerializationException: Failed to 
 deserialize exception response from stream
 at 
 org.elasticsearch.transport.netty.MessageChannelHandler.handlerResponseError(MessageChannelHandler.java:169)
 at 
 org.elasticsearch.transport.netty.MessageChannelHandler.messageReceived(MessageChannelHandler.java:123)
 at 
 org.elasticsearch.common.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
 at 
 org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
 at 
 org.elasticsearch.common.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
 at 
 org.elasticsearch.common.netty.channel.Channels.fireMessageReceived(Channels.java:296)
 at 
 org.elasticsearch.common.netty.handler.codec.frame.FrameDecoder.unfoldAndFireMessageReceived(FrameDecoder.java:462)
 at 
 org.elasticsearch.common.netty.handler.codec.frame.FrameDecoder.callDecode(FrameDecoder.java:443)
 at 
 org.elasticsearch.common.netty.handler.codec.frame.FrameDecoder.messageReceived(FrameDecoder.java:303)
 at 
 org.elasticsearch.common.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
 at 
 org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
 at 
 org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:559)
 at 
 org.elasticsearch.common.netty.channel.Channels.fireMessageReceived(Channels.java:268)
 at 
 org.elasticsearch.common.netty.channel.Channels.fireMessageReceived(Channels.java:255)
 at 
 org.elasticsearch.common.netty.channel.socket.nio.NioWorker.read(NioWorker.java:88)
 at 
 org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:109)
 at 
 org.elasticsearch.common.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:312)
 at 
 org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:90)
 at 
 org.elasticsearch.common.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
 at 
 org.elasticsearch.common.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
 at 
 org.elasticsearch.common.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:744)
 Caused by: java.io.EOFException
 at 
 java.io.ObjectInputStream$BlockDataInputStream.peekByte(ObjectInputStream.java:2598)
 at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1318)
 at java.io.ObjectInputStream.access$300(ObjectInputStream.java:206)
 at 
 java.io.ObjectInputStream$GetFieldImpl.readFields(ObjectInputStream.java:2153)
 at java.io.ObjectInputStream.readFields(ObjectInputStream.java:540)
 at java.net.InetSocketAddress.readObject(InetSocketAddress.java:282)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:606)
 at 
 java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1017)
 at 
 java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1893)
 at 
 java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
 at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
 at 
 java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
 at 
 java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
 at 
 java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
 at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
 at 
 java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
 at 
 java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
 at 
 java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
 at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
 at 

Indexing Images

2014-02-20 Thread ZenMaster80
I am a bit confused about this topic, I would like to index images 
(png,jpegs, gifs...), my understanding is that I need to extract and index 
text portions from images, I don't really care for the meta data. So, I 
looked online and decided to use apache Tika which I also use to extract 
text and index pdfs (pdfs work fine).
- How do I get the text part of images? All I am able to extract is 
metadata which I don't need.
- Ideally I want to say if this image has no text to extract, then 
discard/ignore?  Can you please clarify this topic a bit more and provide 
any samples if available?  Additionaly, I don't want to store the 64 based 
encoded document.

PutMappingResponse putMappingResponse = new PutMappingRequestBuilder(

   client.admin().indices() ).setIndices(
INDEX_NAME).setType(INDEX_TYPE).setSource(
   XContentFactory.jsonBuilder
().startObject()

.startObject(INDEX_TYPE)

   .startObject(_source).field(
enabled,no).endObject()  //I believe this line will not store the base 
64 whole _source, below I store the text portion of file only file

   .startObject(properties)

 .startObject(file)

   .field( term_vector, 
with_positions_offsets )

   .field( store, no )

   .field( type, attachment )

   .field(fields)

  .startObject()

.startObject(file)

.field(store, yes)

.endObject()

.endObject()

 .endObject()

   .endObject()

 .endObject()

   .endObject()

   ).execute().actionGet();


public static void testImage(File file) throws IOException, 
SAXException,TikaException {

   Tika tika = new Tika();

   InputStream inputStream = new BufferedInputStream( new 
FileInputStream(file));

   Metadata metadata = new Metadata();

   ContentHandler handler = new DefaultHandler();

   Parser parser = new JpegParser();

   ParseContext context = new ParseContext();

   String mimeType = tika.detect(inputStream);

metadata.set(Metadata.CONTENT_TYPE, mimeType);

   parser.parse(inputStream,handler,metadata,context);

 for(int i = 0; i metadata.names().length; i++) {  //metaData -I don't 
care for this

   String name = metadata.names()[i];
   System.out.println(name +  :  + metadata.get(name));

 }

  }


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/dbfe132a-c25b-40f0-93a7-7957cf978004%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: Indexing Images

2014-02-20 Thread ZenMaster80
Thanks David. I agree that OCR and maybe any kind of text extraction should 
be done pre-Elastic Search indexing. But, I am just wondering if apache 
tika supports this, or if anyone has experience with using a certain tool. 
I do plan to do extract before indexing.

On Thursday, February 20, 2014 11:38:31 AM UTC-5, ZenMaster80 wrote:

 I am a bit confused about this topic, I would like to index images 
 (png,jpegs, gifs...), my understanding is that I need to extract and index 
 text portions from images, I don't really care for the meta data. So, I 
 looked online and decided to use apache Tika which I also use to extract 
 text and index pdfs (pdfs work fine).
 - How do I get the text part of images? All I am able to extract is 
 metadata which I don't need.
 - Ideally I want to say if this image has no text to extract, then 
 discard/ignore?  Can you please clarify this topic a bit more and provide 
 any samples if available?  Additionaly, I don't want to store the 64 based 
 encoded document.

 PutMappingResponse putMappingResponse = new PutMappingRequestBuilder(

client.admin().indices() ).setIndices(
 INDEX_NAME).setType(INDEX_TYPE).setSource(
XContentFactory.jsonBuilder
 ().startObject()

 .startObject(INDEX_TYPE)

.startObject(_source).field(
 enabled,no).endObject()  //I believe this line will not store the 
 base 64 whole _source, below I store the text portion of file only file

.startObject(properties)

  .startObject(file)

.field( term_vector, 
 with_positions_offsets )

.field( store, no )

.field( type, attachment )

.field(fields)

   .startObject()

 .startObject(file)

 .field(store, yes)

 .endObject()

 .endObject()

  .endObject()

.endObject()

  .endObject()

.endObject()

).execute().actionGet();


 public static void testImage(File file) throws IOException, 
 SAXException,TikaException {

Tika tika = new Tika();

InputStream inputStream = new BufferedInputStream( new 
 FileInputStream(file));

Metadata metadata = new Metadata();

ContentHandler handler = new DefaultHandler();

Parser parser = new JpegParser();

ParseContext context = new ParseContext();

String mimeType = tika.detect(inputStream);

 metadata.set(Metadata.CONTENT_TYPE, mimeType);

parser.parse(inputStream,handler,metadata,context);

  for(int i = 0; i metadata.names().length; i++) {  //metaData -I don't 
 care for this

String name = metadata.names()[i];
System.out.println(name +  :  + metadata.get(name));

  }

   }




-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/fac820d6-5343-4820-8acc-7e20c5663984%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: Searching PDF

2014-02-07 Thread ZenMaster80
So, What's wrong with this?
GET localhost:9200/_search
{
  fields: file,
  query: {
match_all: {}
  }
}

..
hits: {
  total: 1,
  max_score: 1,
  hits: [
 {
_index: docs,
_type: pdf,
_id: 1,
_score: 1,
fields: {
   file: 
JVBERi0xLjQNJeLjz9MNCjE1OCAwIG9iaiA8PC9MaW5lYXJpemVkIDEvTCAzODExNDQvTyAxNjMvRSAyNDcxMS9OIDEzL1QgMzc3OTM2L0ggWyAxMTU2IDQ2OF0+Pg1lbmRvYmoNICAgICAgICAgICAgDQp4cmVmDQoxNTggNDMNCjAwMDAwMDAwMTYgMDAwMDAgbg0KMDAwMDAwMTYyNCAwMDAwMCBuDQowMDAwMDAxNzk0IDAwMDAwIG4NCjAwMDAwMDE4MjAgMDAwMDAgbg0KMDAwMDAwMTg2NiAwMDAwMCBuDQowMDAwMDAxOTAwIDAwMDAwIG4NCjAwMDAwMDIxMDkgMDAwMDAgbg0KMDAwMDAwMjE4OSAwMDAwMCBuDQowMDAwMDAyMjY3IDAwMDAwIG4NCjAwMDAwMDIzNDQgMDAwMDAgbg0KMDAwMDAwMjQyMSAwMDAwMCBuDQowMDAwMDAyNDk4IDAwMDAwIG4NCjAwMDAwMDI1NzUgMDAwMDAgbg0KMDAwMDAwMjY1MiAwMDAwMCBuDQowMDAwMDAyNzI5IDAwMDAwIG4NCjAwMDAwMDI4MDYgMDAwMDAgbg0KMDAwMDAwMjg4MyAwMDAwMCBuDQowMDAwMDAyOTYwIDAwMDAwIG4NCjAwMDAwMDMwMzYgMDAwMDAgbg0KMDAwMDAwMzE5OCAwMDAwMCBuDQowMDAwMDAzNjMwIDAwMDAwIG4NCjAwMDAwMDM2NjYgMDAwMDAgbg0KMDAwMDAwMzkwMCAwMDAwMCBuDQowMDAwMDAzOTc3IDAwMDAwIG4NCjAwMDAwMDQwNTMgMDAwMDAgbg0KMDAwMDAwNDkxMSAwMDAwMCBuDQowMDAwMDA1NzA5IDAwMDAwIG4NCjAwMDAwMD


On Friday, February 7, 2014 4:48:46 PM UTC-5, Binh Ly wrote:

 You should be able to get the textual field values by explicitly 
 requesting them from fields. For example:

 GET localhost:9200/_search
 {
   fields: *,
   query: {
 match_all: {}
   }
 }


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/830dd808-d996-4ff5-bbc9-aaca1d5acd3a%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: Searching PDF

2014-02-07 Thread ZenMaster80
You are correct, my JSON mapping had a wrong entry. Thanks for the help!

On Friday, February 7, 2014 6:10:50 PM UTC-5, Binh Ly wrote:

 It looks like that indexing code might not be correct. I just tried this 
 code and it works for me:

   try {
 String fileContents = readContent( new File( fn6742.pdf ) );
  
 try {
   DeleteIndexResponse deleteIndexResponse = new 
 DeleteIndexRequestBuilder( client.admin().indices(), INDEX_NAME 
 ).execute().actionGet();
   if (deleteIndexResponse.isAcknowledged() ) {
 System.out.println( Deleted index );
   } 
 }
 catch (Exception e) {
   //ignore
 }
  
 CreateIndexResponse createIndexResponse = new 
 CreateIndexRequestBuilder( client.admin().indices(), INDEX_NAME 
 ).execute().actionGet();
  
 if ( createIndexResponse.isAcknowledged() ) {
   System.out.println( Created index );
 }
  
 PutMappingResponse putMappingResponse = new 
 PutMappingRequestBuilder(
 client.admin().indices() ).setIndices(INDEX_NAME).setType( 
 DOCUMENT_TYPE ).setSource(
 XContentFactory.jsonBuilder().startObject()
   .field(doc).startObject()
 .field( properties ).startObject()
   .field( file ).startObject()
 .field( term_vector, with_positions_offsets )
 .field( store, yes )
 .field( type, attachment )
 .field(fields).startObject()
   .field(file).startObject()
 .field(store, yes)
   .endObject()
 .endObject()
   .endObject()
 .endObject()
   .endObject()
 .endObject()
 ).execute().actionGet();
  
 if ( putMappingResponse.isAcknowledged() ) {
   System.out.println( Successfully defined mapping );
 }
  
 IndexResponse indexResponse = client.prepareIndex( INDEX_NAME , 
 DOCUMENT_TYPE, 1)
   .setSource(XContentFactory.jsonBuilder()
   .startObject()
 .field( file).startObject()
   .field(content, fileContents)
   .field(_indexed_chars, -1)
 .endObject()
   .endObject()
 ).execute().actionGet();
  
 System.out.println( Document indexed success:  + 
 indexResponse.isCreated() );
   } catch ( Exception e ) {
 System.out.println(e.toString());
   }


 And then when I query:

 {
   fields: *,
   query: {
 match_all: {}
   }
 }

 I get back this:

 {
   took : 2,
   timed_out : false,
   _shards : {
 total : 5,
 successful : 5,
 failed : 0
   },
   hits : {
 total : 1,
 max_score : 1.0,
 hits : [ {
   _index : msdocs,
   _type : doc,
   _id : 1,
   _score : 1.0,
   fields : {
 file : [ \n1\nISL99201\nCAUTION: These devices are sensitive to 
 electrostatic discharge; follow proper IC Handling 
 Procedures.\n1-888-INTERSIL or 1-888-468-3774]
   }
 } ]
   }
 }



-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/3229ca87-1594-460d-b43a-a802c6a57a74%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


searching while indexing

2014-02-06 Thread ZenMaster80
I am unclear on how does searching work while indexing. lets say I already 
have a document indexed (version 1), and I updated the document, so I will 
index it again (version 2), what happens when the user is searching while 
indexing version 2? Will the user get results from version 1?

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/0fb1e6fe-0d96-4d35-b155-f9de87e5c1f7%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: Improving Bulk Indexing

2014-02-04 Thread ZenMaster80
Good to know, I will keep this in mind, even though I will try to go for 
SSD as I personally had great success with them in the past! When you say 
10-12 MB/sec, is this with doc parsing/processing or just ES index time. 
For my humble test on a quadcore labtop, I am pushing 6 MB/sec with 
processing and 9 MB/sec if I don't include processing time. I tried playing 
with many different settings, I think this is about all its going to do 
giving the machine I am running on. 

On Tuesday, February 4, 2014 4:22:10 PM UTC-5, Jörg Prante wrote:

 My use case is bibliographic data indexing of academic and public 
 libraries. There are ~100m records from various sources that I regularly 
 extract, transform into JSON-LD, and load into Elasticsearch. Some are 
 files, some are fetched by JDBC. I have six 32-core servers in our place, 
 organized in 2 ES clusters. Self installed and configured - no cloud VMs :) 
 With bulk indexing I can push around 10-12m/sec to an ES cluster. 
 Transforming docs is rather complex, needs re-processing of indexed data. 
 The job is done in a few hours so I can perform ETL every night. No SSD, 
 too expensive, but SAS-2 (6Gbit/sec) RAID-0 drives of ~1TB per server.

 Jörg



 On Tue, Feb 4, 2014 at 5:22 PM, ZenMaster80 sabda...@gmail.comjavascript:
  wrote:

 Jörg,

 Great, I learned a lot about the process from your responses. Could you 
 elaborate more on your use case, mine I think will be similar to yours 
 where processing/feeding is on one server and I will use transport client, 
 index nodes will be on EC2. So, when I do get to setting up Ec2 nodes, I 
 believe I should be mostly looking for big cores and SSD.
 For current test, besides running long feeds to guage performance and 
 checking for analyzers, I take it there isn't much else I can do to make 
 significant impact?


 On Tuesday, February 4, 2014 3:11:14 AM UTC-5, Jörg Prante wrote:

 SSD will improve overall performance very much, yes. Disk drives are the 
 slowest part in the chain and this will help. No more low IOPS, so it will 
 significantly reduce the load on CPU (less IO waits).

 More RAM will not help that much. In fact, more RAM will slow down 
 persisting, it increases pressure on the memory-to-disk part. ES obviously 
 does not depend on large RAM for persisting data, some MB suffice, but you 
 can try and see for yourself.

 85 MB is not sufficient for testing index segment merging and GC 
 effects, you should run a bulk indexing feed not for seconds, but for at 
 least 20-30 minutes, if not for hours.

 Also check if your mapping can be simplified, the less complex 
 analyzers, the faster ES can index.

 You should also exercise your feed program how long it takes to process 
 your input without the part of bulk indexing. Then you see a bottom line, 
 and maybe more space for improvement outside ES. 

 In my use case, it helped to move the feed program to another server and 
 use the TransportClient with a speedup of ~30%.

 I agree that 5.5M/sec is not the end of the line but that heavily 
 depends on your hard- and software configuration (machine, OS, file 
 systems, JVM).

 Jörg

  -- 
 You received this message because you are subscribed to the Google Groups 
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an 
 email to elasticsearc...@googlegroups.com javascript:.
 To view this discussion on the web visit 
 https://groups.google.com/d/msgid/elasticsearch/8db08c83-c91d-45df-bd28-5fe49f7f32cd%40googlegroups.com
 .

 For more options, visit https://groups.google.com/groups/opt_out.




-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/e2f2b04d-8b43-4641-a31a-adadfff037e6%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: Improving Bulk Indexing

2014-02-03 Thread ZenMaster80
Jörg,

Just so I understand this, if I were to index 100 MB worth of data total 
with chunk volumes of 5 MB each, this means I have to index 20 times.If I 
were to set the bulk size to 20 MB, I will have to index 5 times. 
This is a small data size, picture I have millions of documents. Are you 
saying the first method is better because of GC operations would be faster?

Thanks again

On Monday, February 3, 2014 9:47:46 AM UTC-5, Jörg Prante wrote:

 Note, bulk operates just on network transport level, not on index level 
 (there are no transactions or chunks). Bulk saves network roundtrips, while 
 the execution of index operations is essentially the same as if you 
 transferred the operations one by one.

 To change refresh interval to -1, use an update settings request like this:


 http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/indices-update-settings.html

 ImmutableSettings.Builder settingsBuilder = 
 ImmutableSettings.settingsBuilder();
 settingsBuilder.put(refresh_interval, -1));
 UpdateSettingsRequest updateSettingsRequest = new 
 UpdateSettingsRequest(myIndexName)
 .settings(settingsBuilder);
 client.admin().indices()
 .updateSettings(updateSettingsRequest)
 .actionGet();

 Jörg



-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/531710e5-e42a-4ed1-a1e0-ad5d48e14146%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: Improving Bulk Indexing

2014-02-03 Thread ZenMaster80
Thanks again for clarifying this, I think I understand this, what I was 
referring to in my prior posts was the difference between setting 1000 
documents vs 1 documents, I was thinking the bigger the chunk volume 
will produce less over the wire index requests, but I understand your 
reasoning behind thrashing and slow GC. The numbers below kind of support 
my theory, as I increased the chunk to 10 MB or 10,000 docs, I saw a slight 
improvement in total indexing time (I think).
I would like to get your/others feedback on some numbers/benchmarks, I 
tested with bulkrequest and with bulkprocessor, both similar results (I 
seem to think it is slow?)

- Same source for testing (85 MB)
- Running one node/1 shard/ 0 replica on local mac book 8 cores, 4G RAM
- Used Bulk batch size 1MB   concurrentRequests = 1, I indexed 85 MB in 
~17 seconds.
- Used Bulk batch size 1MB   concurrentRequests = 8, I indexed 85 MB in 
~15 seconds.
- Used Bulk batch size 5MB   concurrentRequests = 1, I indexed 85 MB in 
~15 seconds.
- Used Bulk batch size 5MB   concurrentRequests = 8, I indexed 85 MB in 
~17 seconds.
- Used Bulk batch size 10MB   concurrentRequests = 1, I indexed 85 MB in 
~13 seconds.
- Used Bulk batch size 10MB   concurrentRequests = 8, I indexed 85 MB in 
~13 seconds.
- Using number of docs 
--
- Used Bulk 1000 docs   concurrentRequests = 1, I indexed 85 MB in ~15 
seconds.
- Used Bulk 1000 docs   concurrentRequests = 8, I indexed 85 MB in ~13 
seconds.
- Used Bulk 1 docs   concurrentRequests = 1, I indexed 85 MB in ~15 
seconds.
- Used Bulk 1 docs   concurrentRequests = 8, I indexed 85 MB in 
~12/~13 seconds.

Ok, So an average of 15 sec for 85MB, 5.5 MB/sec. Why do I think this is 
slow. I am not sure if I am doing the right math, but for 20 million docs 
(27 TB data), this will take 2 days?
 I understand with better machines like SSD and more RAM I will get better 
results. However, I would like to optimize what I have now to the fullest 
before scaling up. What other configurations can I tweak to improve for my 
current test?

.put(client.transport.sniff, true)

.put(refresh_interval, -1) 

.put(number_of_shards, 1)

.put(number_of_replicas, 0)



On Monday, February 3, 2014 2:02:32 PM UTC-5, Jörg Prante wrote:

 Not sure if I understand.

 If I had to index a pile of documents, say 15M, I would build bulk request 
 of 1000 documents, where each doc is in avg ~1K so I end up at ~1MB. I 
 would not care about different doc size as they equal out over the total 
 amountThen I send this bulk request over the wire. With a threaded bulk 
 feeder, I can control concurrent bulk requests of up to the number of CPU 
 cores, say 32 cores. Then repeat. In total, I send 15K bulk requests.

 The effect is that on the ES cluster, each bulk request of 1M size 
 allocates only few resources on the heap and the bulk request can be 
 processed fast. If the cluster is slow, the client sees the ongoing bulk 
 requests piling up before bulk responses are returned, and can control bulk 
 capacity against a maximum concurrency limit. If the cluster is fast, the 
 client receives responses almost instantly, and the client can decide if it 
 is more appropriate to increase bulk request size or concurrency.

 Does it make sense?

 Jörg




 On Mon, Feb 3, 2014 at 5:06 PM, ZenMaster80 sabda...@gmail.comjavascript:
  wrote:

 Jörg,

 Just so I understand this, if I were to index 100 MB worth of data total 
 with chunk volumes of 5 MB each, this means I have to index 20 times.If I 
 were to set the bulk size to 20 MB, I will have to index 5 times. 
 This is a small data size, picture I have millions of documents. Are you 
 saying the first method is better because of GC operations would be faster?

 Thanks again


 On Monday, February 3, 2014 9:47:46 AM UTC-5, Jörg Prante wrote:

 Note, bulk operates just on network transport level, not on index level 
 (there are no transactions or chunks). Bulk saves network roundtrips, while 
 the execution of index operations is essentially the same as if you 
 transferred the operations one by one.

 To change refresh interval to -1, use an update settings request like 
 this:

 http://www.elasticsearch.org/guide/en/elasticsearch/
 reference/current/indices-update-settings.html

 ImmutableSettings.Builder settingsBuilder = ImmutableSettings.
 settingsBuilder();
 settingsBuilder.put(refresh_interval, -1));
 UpdateSettingsRequest updateSettingsRequest = new 
 UpdateSettingsRequest(myIndexName)
 .settings(settingsBuilder);
 client.admin().indices()
 .updateSettings(updateSettingsRequest)
 .actionGet();

  Jörg

  -- 
 You received this message because you are subscribed to the Google Groups 
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an 
 email to elasticsearc...@googlegroups.com javascript:.
 To view

Loading JSON to ElasticSearch

2014-01-28 Thread ZenMaster80
I would like to get your perspective on how to load json to index server in 
my scenario.
We have about 15 million documents in html/pdf/... on Server 1
I would like to process the data and convert to json on server 2
I would like the indexer to index json n a separate machine/server server 3

Ideally I thought on Server 2, as I prepare json and have it ready in 
memory, I can feed it to indexer. But since data processing is cpu 
intensive, I want indexing to be done on a separate machines/server.
How do you guys deal with this since I can no longer feed in-memory json to 
the indexer on separate machine? Do I just grab files from server 2 and 
index them then?

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/05b977ac-00d0-45c0-9e58-8df523e6978c%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: Loading JSON to ElasticSearch

2014-01-28 Thread ZenMaster80
Thanks David, I will certainly look into hashtag. Do you think it is a good 
idea to separate data analysis and indexing into 2 different machines since 
both require lots of cpu time. 
If I use hashtag to send files over to ES, will I be able to use native 
Java API or http, and is there any preference to the API? I have noticed 
there are somethings that aren't very easy and may be don't even work in 
the native API? 
Thanks again.

On Tuesday, January 28, 2014 1:05:32 PM UTC-5, David Pilato wrote:

 Did you try https://github.com/dadoonet/fsriver?
 Never tested it with so many docs but may be it could help you here?

 If you have already generated json files on a server, then I would 
 recommend trying logstash to send them into elasticsearch. 

 My 2 cents

 -- 
 *David Pilato* | *Technical Advocate* | *Elasticsearch.com*
 @dadoonet https://twitter.com/dadoonet | 
 @elasticsearchfrhttps://twitter.com/elasticsearchfr


 Le 28 janvier 2014 at 16:46:06, ZenMaster80 (sabda...@gmail.comjavascript:) 
 a écrit:

 I would like to get your perspective on how to load json to index server 
 in my scenario. 
 We have about 15 million documents in html/pdf/... on Server 1
 I would like to process the data and convert to json on server 2
 I would like the indexer to index json n a separate machine/server server 3

 Ideally I thought on Server 2, as I prepare json and have it ready in 
 memory, I can feed it to indexer. But since data processing is cpu 
 intensive, I want indexing to be done on a separate machines/server.
 How do you guys deal with this since I can no longer feed in-memory json 
 to the indexer on separate machine? Do I just grab files from server 2 and 
 index them then?
  --
 You received this message because you are subscribed to the Google Groups 
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an 
 email to elasticsearc...@googlegroups.com javascript:.
 To view this discussion on the web visit 
 https://groups.google.com/d/msgid/elasticsearch/05b977ac-00d0-45c0-9e58-8df523e6978c%40googlegroups.com
 .
 For more options, visit https://groups.google.com/groups/opt_out.



-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/a02427ec-a3d8-484f-9cfb-2ba7628192b1%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: Loading JSON to ElasticSearch

2014-01-28 Thread ZenMaster80
Thanks David, I will certainly look into logstash. Do you think it is a 
good idea to separate data analysis and indexing into 2 different machines 
since both require lots of cpu time. 
If I use logstash to send files over to ES, will I be able to use native 
Java API or http, and is there any preference to the API? I have noticed 
there are somethings that aren't very easy and may be don't even work in 
the native API? 
Thanks again

On Tuesday, January 28, 2014 1:05:32 PM UTC-5, David Pilato wrote:

 Did you try https://github.com/dadoonet/fsriver?
 Never tested it with so many docs but may be it could help you here?

 If you have already generated json files on a server, then I would 
 recommend trying logstash to send them into elasticsearch. 

 My 2 cents

 -- 
 *David Pilato* | *Technical Advocate* | *Elasticsearch.com*
 @dadoonet https://twitter.com/dadoonet | 
 @elasticsearchfrhttps://twitter.com/elasticsearchfr


 Le 28 janvier 2014 at 16:46:06, ZenMaster80 (sabda...@gmail.comjavascript:) 
 a écrit:

 I would like to get your perspective on how to load json to index server 
 in my scenario. 
 We have about 15 million documents in html/pdf/... on Server 1
 I would like to process the data and convert to json on server 2
 I would like the indexer to index json n a separate machine/server server 3

 Ideally I thought on Server 2, as I prepare json and have it ready in 
 memory, I can feed it to indexer. But since data processing is cpu 
 intensive, I want indexing to be done on a separate machines/server.
 How do you guys deal with this since I can no longer feed in-memory json 
 to the indexer on separate machine? Do I just grab files from server 2 and 
 index them then?
  --
 You received this message because you are subscribed to the Google Groups 
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an 
 email to elasticsearc...@googlegroups.com javascript:.
 To view this discussion on the web visit 
 https://groups.google.com/d/msgid/elasticsearch/05b977ac-00d0-45c0-9e58-8df523e6978c%40googlegroups.com
 .
 For more options, visit https://groups.google.com/groups/opt_out.



-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/f536d58c-89ab-4609-b5ca-cef44e2b879a%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Native client or REST

2014-01-23 Thread ZenMaster80
I thought I understood this, but maybe not. I hope someone can shed some 
light on this.

I have to index tons of files and I would like to be able to query it from 
our web application written in javascript, all will be running on AWS EC2.
Question: If I index the files using native JAVA API, will I be able to 
perform queries/searches from the web application via http/REST API?
I am curious to know how people approach this? Note, I would prefer to work 
with Java, but willing to do something else if it makes more sense.

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/58e6edbc-09b5-4375-a511-77b4c0b1a15b%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


TransportClient not connecting

2014-01-22 Thread ZenMaster80
I can't seem to figure out this problem, Node from NodeBuilder works, but 
If I use transportclient like below, I get an exception.
//I am using all default settings
//elasticsearch-0.90.9

Settings settings = ImmutableSettings.settingsBuilder().put(cluster.name, 
elasticsearch).build();
InetSocketTransportAddress(localhost, 9300));






Exception in thread main org.elasticsearch.client.transport.
NoNodeAvailableException: No node available

 at org.elasticsearch.client.transport.TransportClientNodesService.execute(
TransportClientNodesService.java:213)

 at org.elasticsearch.client.transport.support.InternalTransportClient.
execute(InternalTransportClient.java:106)

 at org.elasticsearch.client.support.AbstractClient.bulk(AbstractClient.java
:149)

 at org.elasticsearch.client.transport.TransportClient.bulk(TransportClient.
java:346)

 at org.elasticsearch.action.bulk.BulkRequestBuilder.doExecute(
BulkRequestBuilder.java:165)

 at org.elasticsearch.action.ActionRequestBuilder.execute(
ActionRequestBuilder.java:85)

 at org.elasticsearch.action.ActionRequestBuilder.execute(
ActionRequestBuilder.java:59)

 at EntryPoint.createBulkIndexes(EntryPoint.java:305)

 

 at EntryPoint.main(EntryPoint.java:147)




-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/6f8941ac-11df-4f8d-9670-cc609c7aca6e%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: TransportClient not connecting

2014-01-22 Thread ZenMaster80
Anyone using transportclient from java?

On Wednesday, January 22, 2014 12:04:30 PM UTC-5, ZenMaster80 wrote:

 I can't seem to figure out this problem, Node from NodeBuilder works, but 
 If I use transportclient like below, I get an exception.
 //I am using all default settings
 //elasticsearch-0.90.9

 Settings settings = ImmutableSettings.settingsBuilder().put(cluster.name
 , elasticsearch).build();
 InetSocketTransportAddress(localhost, 9300));






 Exception in thread main org.elasticsearch.client.transport.
 NoNodeAvailableException: No node available

  at org.elasticsearch.client.transport.TransportClientNodesService.execute
 (TransportClientNodesService.java:213)

  at org.elasticsearch.client.transport.support.InternalTransportClient.
 execute(InternalTransportClient.java:106)

  at org.elasticsearch.client.support.AbstractClient.bulk(AbstractClient.
 java:149)

  at org.elasticsearch.client.transport.TransportClient.bulk(
 TransportClient.java:346)

  at org.elasticsearch.action.bulk.BulkRequestBuilder.doExecute(
 BulkRequestBuilder.java:165)

  at org.elasticsearch.action.ActionRequestBuilder.execute(
 ActionRequestBuilder.java:85)

  at org.elasticsearch.action.ActionRequestBuilder.execute(
 ActionRequestBuilder.java:59)

  at EntryPoint.createBulkIndexes(EntryPoint.java:305)

  

  at EntryPoint.main(EntryPoint.java:147)






-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/4731270f-60ca-4d41-9056-9179c23d6a53%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: TransportClient not connecting

2014-01-22 Thread ZenMaster80
Brian,

This is no different from what I have. I googled the problem, and I guess 
this may come from the fact that ES js using a different java version. I 
have added the es 0.90.0.jar to java from the es installation folder. I 
have no clue what I am missing.

On Wednesday, January 22, 2014 2:02:57 PM UTC-5, InquiringMind wrote:

 ImmutableSettings.Builder settingsBuilder = 
 ImmutableSettings.settingsBuilder();
 settingsBuilder.put(cluster.name, clusterName);
 TransportClient client = new TransportClient(settingsBuilder.build());

 for (String host : hostNames)
 {
   InetSocketTransportAddress server_address = new 
 InetSocketTransportAddress(
   host, portTransport);

   client.addTransportAddress(server_address);
 }

 Brian


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/ebaf9d9d-808a-4fc3-a94f-4f7c5564c1dd%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: TransportClient not connecting

2014-01-22 Thread ZenMaster80
 

java version 1.7.0_11

Java(TM) SE Runtime Environment (build 1.7.0_11-b21)

Java HotSpot(TM) 64-Bit Server VM (build 23.6-b04, mixed mode)

I spent too much time on this, I gave up. I'll ask the question 
differently, I wanted to use the transport client at 9300 so I can index a 
file, and the intent to search it with http://localhost:9300/_search...for 
demo purposes since I didn't want to search it using java code. I am able 
to index the file with nodeBuilder, is there a way I can query it using 
http? My understanding is that the node is local, can I query it somehow 
over http?

On Wednesday, January 22, 2014 10:02:14 PM UTC-5, Ross Simpson wrote:

 java -version will tell you the exact version, patch level, vendor, and 
 architecture of that JVM.  The tricky bit can be finding out which JVM 
 you're actually using (usually the value in $JAVA_HOME or `which java`will 
 lead you in the right direction).  If you're running your example 
 under an IDE, it might well be using a different JVM than the ES server.

 I've had similar troubles to what you describe, but not on the localhost. 
  Do you get the exception right away, or some time after starting up your 
 client?

 Ross


 On Thursday, 23 January 2014 10:34:40 UTC+11, ZenMaster80 wrote:

 Yes, I do have 0.90.9 across the board.
 I know 9300 is opened.
 I am not sure how to check if both are using same JVM?
 es.yml is default, default clustername, nodename .. I only have the 
 default (1 Node)... Do I need to specify unicast instead of the default 
 which I believe uses multiCast?

 On Wednesday, January 22, 2014 3:25:26 PM UTC-5, Jörg Prante wrote:

 You wrote that you have a 0.90.9 cluster but you added 0.90.0 jars to 
 the client. Is that correct?

 Please check:

 - if your cluster nodes and client node is using exactly the same JVM

 - if your cluster and client use exactly the same ES version 

 - if your cluster and client use the same cluster name

 - reasons outside ES: IP blocking, network reachability, network 
 interfaces, IPv4/IPv6 etc.

 Then you should be able to connect with TransportClient.

 Jörg



-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/15445d05-937e-45ce-800b-d57f82bf2af9%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Return specific field and highlights via Java API

2014-01-20 Thread ZenMaster80
I am having two issues using the java api
1. I am not able to return specific field in my search query - It shows I 
have the right number of results, but displays Null
2. Not return highlights
Note: Assume Indexing is fine, because I am able to get correct results if 
comment out the line .AddField(hid)
 I am using default everything, I understand for highlights _source for 
field has to be enabled, but I thought if not, it grabs the original source.

json:
{uid:'123, name:hello},
{uid:'1234, name:hello1}

node = NodeBuilder.nodeBuilder() //

.local(true)//

.data(true) //

.node();

client = node.client();

   //..createIndex


private void search(String index, String type,String field, String value)

  {

  SearchResponse response = client.prepareSearch(index)

  

  .setTypes(type)

 .addHighlightedField(uid)

 .addField(uid)

 SearchHit[] results = response.getHits().getHits();


System.out.println(Current results:  + results.length);

  for (SearchHit hit : results) {

System.out.println(--);

  MapString,Object result = hit.getSource();  

 System.out.println(result);

}


}


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/4984505f-9946-4855-8bf0-5dd11b0a1b21%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: Indexing PDF and other binary formats

2014-01-16 Thread ZenMaster80
Thanks for the reply. the attachment plugin I understand encodes content 
before indexing it, this sounds like an expensive operation if we have lots 
of pdfs. I was thinking extracting text from pdf early on instead and deal 
with text instead.
Does the plugin also work for binaries like images?

On Thursday, January 16, 2014 4:12:47 PM UTC-5, David Pilato wrote:

 You can use Tika by yourself (recommended). See how I did it in fsriver 
 project.
 You can use mapper attachment plugin which is using Tika behind the scene 
 but gives you less control IMHO.

 About versions, elasticsearch does not keep old versions around. If you 
 need that, you have to manage it yourself.

 HTH

 --
 David ;-)
 Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

 Le 16 janv. 2014 à 20:42, ZenMaster80 sabda...@gmail.com javascript: 
 a écrit :

 - Is there any literature on how to index pdf documents and binary formats 
 like images?
 - Versioning question: If I update an already indexed document, I believe 
 ES will update the version number. I am wondering if it keeps the previous 
 document, what if I needed access to the previous document?

 -- 
 You received this message because you are subscribed to the Google Groups 
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an 
 email to elasticsearc...@googlegroups.com javascript:.
 To view this discussion on the web visit 
 https://groups.google.com/d/msgid/elasticsearch/a9e8f331-c4bd-4a4c-be5a-b91e4f2f0e26%40googlegroups.com
 .
 For more options, visit https://groups.google.com/groups/opt_out.



-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/94b706cf-c4de-4f94-87b7-48c9e6e814b0%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: How to query Elastic Search from my web app?

2014-01-15 Thread ZenMaster80
Great, I also found this helpful, by simple making ajax calls.
http://www.elasticsearch.org/tutorials/javascript-web-applications-and-elasticsearch/


On Thursday, January 16, 2014 1:00:44 AM UTC-5, David Pilato wrote:

 This? 
 http://www.elasticsearch.org/blog/client-for-node-js-and-the-browser/

 --
 David ;-)
 Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs


 Le 16 janv. 2014 à 06:18, ZenMaster80 sabda...@gmail.com javascript: 
 a écrit :

 I am not very clear on how to do this, I have the following scenario:
 My data/docs are indexed using scala native Java API.
 - I would like to use the REST http API to access ES, What I would like to 
 understand is how can I query ES server from my web application written in 
 Java Script, are there any existing APIs that I can use with Java Script? I 
 understand that I can't use curl for instance with javaScript.
 - What's the best approach to this in order to make the solution/code 
 maintainable and scalable?

 -- 
 You received this message because you are subscribed to the Google Groups 
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an 
 email to elasticsearc...@googlegroups.com javascript:.
 To view this discussion on the web visit 
 https://groups.google.com/d/msgid/elasticsearch/a8b5a3df-9fc8-4e9d-82f1-98b4a4d1c57a%40googlegroups.com
 .
 For more options, visit https://groups.google.com/groups/opt_out.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/c18d7ad5-9e11-4c92-abb7-b15c63360a95%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


How to approach Indexing for a newbie?

2014-01-14 Thread ZenMaster80
I have a project that used an old search engine and I would like to move 
things to ElasticSearch. I have been doing some reading, and I wanted some 
perspective on how to approach the problem.
- I have bundles(folders) of text/html/pdf/img documents, each folder has 
an average of 50-100 documents, document is about 100K in Size.
- The number of folders and documents can increase and decrease, mostly 
increase but very slightly.

I understand that txt/html will need to be turned into JSON now, and 
somehow I will have to create an index and add these documents to the index 
for indexing. I have some questions that I don't fully understand still.
1- How do I know how many indices do I need?
2- How do  I know how many shards to allocate when creating the index?
3- How do I know how many nodes needed, and how do I make things scale up 
and down? Is there a way to idle things when no indexing is happening? 
4- How do I add documents to the index for indexing? I always see example 
with JSON snippets, but in reality I have something like 
folder1{doc1,doc2,..doc100}, folder2{docA...docN} ...
5- This is probably a dumb question...Is there a preferable language to use 
for the indexing calls? If I were to build an app to call the REST API, 
which language I need to use to do this if at all?

Thanks again for the help.

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/39e218f3-395c-44b9-bac1-cc2994e26391%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: How to approach Indexing for a newbie?

2014-01-14 Thread ZenMaster80
Wow, this is exactly what I was looking for. I am a bit curious on #5, I am 
assuming there is a Java API to access ES, is there any link on how to get 
started using Java with ES? I would like to know how to import ES 
framework/API into java project.

Thanks again, this is a great clarification!

On Tuesday, January 14, 2014 4:17:31 PM UTC-5, Jörg Prante wrote:

 1. Mostly, indexes are result of a partition design outside ES. For 
 example, by time, user, data origin. The beauty of ES is that it can host 
 as many indexes as you wish.

 2. If your maximum number of nodes (hosts) you want to spend to ES is 
 known, use that node number for the number of shards. So you make sure your 
 cluster can scale. If the number is not known, try to estimate the total 
 number of documents to get indexed, the total volume of that documents, and 
 an estimated index volume per shard. Rule of thumb: a shard should be sized 
 so it can fit into the Java heap and so that it can be moved between nodes 
 in reasonable time (~1-10 GB).

 3. You can scale up by adding nodes - just start ES on another host. Scale 
 down is also easy, stop ES on a node.

 4. You have to write a program that traverses your folders, picks up each 
 document, and extracts fields from the document to get them indexed. With 
 scrutmydocs.org you can experiment how this works by using such a file 
 traverser which is already prepared to handle quite a lot of file types 
 automatically.

 5. You should consider using one of the standard clients. As ES supports 
 HTTP REST, and the standard clients are designed to support a comparable 
 set of features, it does not matter what language you use. Just pick your 
 favorite language. (My personal favorite is Java, where there is no need to 
 use HTTP REST, instead the native transport protocol can be used)

 Jörg



-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/d6586c50-fad0-46e5-8ff5-d624d821d937%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: How to approach Indexing for a newbie?

2014-01-14 Thread ZenMaster80
Thanks. I added the .jar as a dependency in a simple java project using 
eclipse. 
I get this error when I try to run the program, any clues?

Exception in thread main java.lang.NoClassDefFoundError: 
org/apache/lucene/util/Version

at org.elasticsearch.Version.clinit(Version.java:42)

at org.elasticsearch.node.internal.InternalNode.init(InternalNode.java:121
)

at org.elasticsearch.node.NodeBuilder.build(NodeBuilder.java:159)

at org.elasticsearch.node.NodeBuilder.node(NodeBuilder.java:166)

at EntryPoint.main(EntryPoint.java:25)

Caused by: java.lang.ClassNotFoundException: org.apache.lucene.util.Version

at java.net.URLClassLoader$1.run(URLClassLoader.java:366)

at java.net.URLClassLoader$1.run(URLClassLoader.java:355)

at java.security.AccessController.doPrivileged(Native Method)

at java.net.URLClassLoader.findClass(URLClassLoader.java:354)

at java.lang.ClassLoader.loadClass(ClassLoader.java:423)

at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)

at java.lang.ClassLoader.loadClass(ClassLoader.java:356)

... 5 more



On Tuesday, January 14, 2014 5:22:22 PM UTC-5, Jörg Prante wrote:

 To get an overview what is possible, look at the Elasticsearch test 
 sources at  
 https://github.com/elasticsearch/elasticsearch/tree/master/src/test/java/org/elasticsearch

 There are many code snippets that are useful for learning how to use the 
 Java API.

 You can use Elasticsearch by adding the jar as a dependency in your 
 project (with Maven it is very easy).

 Jörg



-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/c6e0080d-108c-4eda-af15-9cce9546dca5%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: How to index an existing json file

2014-01-08 Thread ZenMaster80
Thank you for the binary flag tip. It is also in the documentation here:

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/docs-bulk.html

On Tuesday, January 7, 2014 9:00:33 PM UTC-5, ZenMaster80 wrote:

 Hi,

 I am just starting with ElasticSearch, I would like to know how to index a 
 simple json document books.json that has the following in it: Where do I 
 place the document? I placed it in root directory of elastic search and in 
 /bin folder..

 {“books”:[{“name”:”life in heaven”,”author”:”Mike Smith”},{“name”:”get 
 rich”,”author”:”Joe Shmoe”},{“name”:”luxury properties”,”author”:”Linda 
 Jones”]}}


 $ curl -XPUT http://localhost:9200/books/book/1; -d @books.json

 Warning: Couldn't read data from file books.json, this makes an empty 
 POST.

 {error:MapperParsingException[failed to parse, document is 
 empty],status:400}


 Thanks


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/853a876f-c6cb-4dd5-907a-13f626b3f078%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


How to index an existing json file

2014-01-07 Thread ZenMaster80
Hi,

I am just starting with ElasticSearch, I would like to know how to index a 
simple json document books.json that has the following in it: Where do I 
place the document? I placed it in root directory of elastic search and in 
/bin folder..

{“books”:[{“name”:”life in heaven”,”author”:”Mike Smith”},{“name”:”get 
rich”,”author”:”Joe Shmoe”},{“name”:”luxury properties”,”author”:”Linda 
Jones”]}}


$ curl -XPUT http://localhost:9200/books/book/1; -d @books.json

Warning: Couldn't read data from file books.json, this makes an empty 
POST.

{error:MapperParsingException[failed to parse, document is 
empty],status:400}


Thanks

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/a5c1e37f-9472-499c-9499-1475c944f47b%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: How to index an existing json file

2014-01-07 Thread ZenMaster80
Great, Do you know why I am getting  

{error:MapperParsingException[failed to parse]; nested: 
JsonParseException[Unrecognized token 'life': was expecting ('true', 
'false' or 'null')\n at [Source: [B@5c9a9d06; line: 1, column: 35]]; 
,status:400}

data:

{“books”:[{“name”:”life in heaven”,”author”:”Mike Smith”},{“name”:”get 
rich”,”author”:”Joe Shmoe”},{“name”:”luxury properties”,”author”:”Linda 
Jones”}]}



On Tuesday, January 7, 2014 9:06:01 PM UTC-5, Ivan Brusic wrote:

 The JSON file is used by the curl command, so in your example it should be 
 in the same directory in which you executed the command (current directory).

 -- 
 Ivan


 On Tue, Jan 7, 2014 at 6:00 PM, ZenMaster80 sabda...@gmail.comjavascript:
  wrote:

 Hi,

 I am just starting with ElasticSearch, I would like to know how to index 
 a simple json document books.json that has the following in it: Where do 
 I place the document? I placed it in root directory of elastic search and 
 in /bin folder..

 {“books”:[{“name”:”life in heaven”,”author”:”Mike Smith”},{“name”:”get 
 rich”,”author”:”Joe Shmoe”},{“name”:”luxury properties”,”author”:”Linda 
 Jones”]}}


 $ curl -XPUT http://localhost:9200/books/book/1; -d @books.json

 Warning: Couldn't read data from file books.json, this makes an empty 
 POST.

 {error:MapperParsingException[failed to parse, document is 
 empty],status:400}


 Thanks

 -- 
 You received this message because you are subscribed to the Google Groups 
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an 
 email to elasticsearc...@googlegroups.com javascript:.
 To view this discussion on the web visit 
 https://groups.google.com/d/msgid/elasticsearch/a5c1e37f-9472-499c-9499-1475c944f47b%40googlegroups.com
 .
 For more options, visit https://groups.google.com/groups/opt_out.




-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/5d15fcdf-4a0f-4d92-9dd3-f07899d915fe%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.