Re: EC2 Discovery
I am not sure if I missed something, but what you mentioned I believe I already tried as showing in my original post. I can connect to each machine individually and I am able ti index and query it fine with default configuration without any zen or ec2 settings. But, when I turned them on like I show on the post, I get this Request failed to get to the server (status code: 0): when trying to query the instance. Did you mean I should try to see if I can access one instance from the other? This I didn't try yet. On Friday, March 21, 2014 4:46:40 AM UTC-4, Norberto Meijome wrote: Don't try ec2 discovery until you have tested that: - you can connect from one machine to another on port 9300 ( nc as client and server, basic networking/ firewalling) - run a simple aws ec2 describe instances call with the API key you plan to use, and you can see the machines you need there. Bonus points for filtering based on the rules you intense to use ( sec group, tags). This is to ensure your API keys have the correct access needed. Once you have those basic steps working, use them on es config. Make sure you enable ec2 discovery and disable the zen discovery ( it will run first and likely time out and ec2 disco won't get to exec). The other thing to watch out for is contacting nodes which are too busy to ack your new nodes request for cluster info...but that would be a problem with zen disco too. On 21/03/2014 12:31 PM, Raphael Miranda raphael...@gmail.comjavascript: wrote: are both machines in the same security group? -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com javascript:. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/eb8bb939-3b9d-4f5b-a45c-3d529f75983e%40googlegroups.com . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/7c9e9da8-6efe-4005-8a69-c00daa6ec711%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: EC2 Discovery
I am not sure if I missed something, but what you mentioned I believe I already tried as showing in my original post. I can connect from one instance to another. I can connect to each machine individually and I am able to index and query it fine with default configuration without any zen or ec2 settings. But, when I turned them on like I show on the post, I get this Request failed to get to the server (status code: 0): when trying to query the instance, and when I do this, it won't even log anything, it is not getting that far. On Friday, March 21, 2014 4:46:40 AM UTC-4, Norberto Meijome wrote: Don't try ec2 discovery until you have tested that: - you can connect from one machine to another on port 9300 ( nc as client and server, basic networking/ firewalling) - run a simple aws ec2 describe instances call with the API key you plan to use, and you can see the machines you need there. Bonus points for filtering based on the rules you intense to use ( sec group, tags). This is to ensure your API keys have the correct access needed. Once you have those basic steps working, use them on es config. Make sure you enable ec2 discovery and disable the zen discovery ( it will run first and likely time out and ec2 disco won't get to exec). The other thing to watch out for is contacting nodes which are too busy to ack your new nodes request for cluster info...but that would be a problem with zen disco too. On 21/03/2014 12:31 PM, Raphael Miranda raphael...@gmail.comjavascript: wrote: are both machines in the same security group? -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com javascript:. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/eb8bb939-3b9d-4f5b-a45c-3d529f75983e%40googlegroups.com . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/48571118-fd84-45da-9aaf-0314c936336b%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
EC2 Discovery
Any clues to what i am missing, i turned discovery trace on, but dont't see any useful info. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/7f3dce1e-53d1-4c38-804f-6262896d43d6%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Bulk Processor
David, Sorry, I didn't quite follow, does it do the flushing automatically or am I supposed to tell it? On Wednesday, March 12, 2014 4:05:49 PM UTC-4, David Pilato wrote: It also flush docs after a given time, let's say every 5 seconds. BTW there is a small issue which basically flush the Bulk every n-1 docs instead of n. Fix is on the way. -- David ;-) Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs Le 12 mars 2014 à 20:51, ZenMaster80 sabda...@gmail.com javascript: a écrit : I don't quite undertsand what the bulk processor is doing this, I would like someone to explain how it is upposed to work to make sure I designed this correctly. I specify the number of actions 1000. my feeder keeos pushing documents to it Its more like a loop iterating documents folders, and I push eash document to the bulk. I expected the bulk to queue things until it reaches 1000 docs, then processes the bulk? Yet, this is how it logs, thie comes from the call back functions of the bulk processor. Bulk Called: ID= 1, Actions=33, MB=5.46250 Bulk Called: ID= 2, Actions=29, MB=5.51660 Bulk Succeeded: ID= 1, took= 921 ms Bulk Called: ID= 3, Actions=12, MB=5.691812 Bulk Succeeded: ID= 2, took= 1526 ms . Bulk Called: ID= 23, Actions=8, MB=5.45294 Bulk Succeeded: ID= 23, took= 751 ms Bulk Called: ID= 24, Actions=19, MB=5.383918 Bulk Succeeded: ID= 24, took= 331 ms Bulk Called: ID= 25, Actions=22, MB=5.347542 Bulk Succeeded: ID= 25, took= 694 ms Bulk Called: ID= 26, Actions=58, MB=5.249195 Bulk Succeeded: ID= 26, took= 583 ms Bulk Called: ID= 27, Actions=89, MB=5.244396 Bulk Succeeded: ID= 27, took= 588 ms. Bulk Called: ID= 47, Actions=17, MB=5.245771 ... Bulk Succeeded: ID= 47, took= 431 ms Finished Processing the whole thing -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com javascript:. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/3f06e4bc-eb79-4dd8-b987-1bf86c028062%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/3f06e4bc-eb79-4dd8-b987-1bf86c028062%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/9cb96ece-d30d-49a2-bcb4-bb09098094fc%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Occational client.transport.NoNodeAvailableException
I will post logs in a bit. I plan to wun on EC2, but currently just running on a local machine i7, 4G Ram. I had int concurrentRequests = Runtime.getRuntime().availableProcessors(); (Returns 8), If I change this value to just 1, I don't get the exception, but indexing performance slows down considerably. I am not sure if 8 requests is really overwhelming the node. On Friday, March 14, 2014 3:58:21 PM UTC-4, Binh Ly wrote: I'm curious, is there anything else in the es log files? Also are you running on EC2 micro instances? -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/1325173c-230d-416a-8a25-3b2201fa987a%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Mapping Attachment plugin Installtion/dubian
I am having trouble finding how to install the above plugin? I installed Elastic Search with Dubian. Typically On my local linux machine I did /bin/plugin , I am not sure where is the 'bin/plugin goes with the dubian installation? Thanks -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/685b87e0-34bd-49da-993a-5a92927cc0f1%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: [Ann] Elasticsearch Image Plugin 1.1.0 released
Great, I am interested in trying this. On Thursday, March 13, 2014 7:09:38 AM UTC-4, Kevin Wang wrote: Hi All, I've released version 1.1.0 of Elasticsearch Image Plugin. The Image Plugin is an Content Based Image Retrieval Plugin for Elasticsearch using LIRE (Lucene Image Retrieval). It allows users to index images and search for similar images. Changes in 1.1.0: - Added limit in image query - Added plugin version in es-plugin.properties https://github.com/kzwang/elasticsearch-image Also I've created a demo website for this plugin ( http://demo.elasticsearch-image.com/), it has 1,000,000 images (well, haven't finish index all images yet, but it should be able to demo this plugin) from MIRFLICKR-1M collection (http://press.liacs.nl/mirflickr) Thanks, Kevin -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/5003a60a-4013-4273-87ef-b30a298d78d4%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
How to install Mapping attachment Plugin with debian install
On my local machine, I do this: bin/plugin -install ... With debian installation, I am not sure where the bin/plugin' folder is? Anyone knows? -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/14d64f1f-fb5d-4c7c-876c-726814229d26%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: How to install Mapping attachment Plugin with debian install
Thanks - I figured it out as soon as I posted. I found this explained the directory structure well. https://gist.github.com/mystix/5460660 On Thursday, March 13, 2014 1:48:07 PM UTC-4, David Pilato wrote: It should be in /usr/share/elasticsearch/bin/ -- *David Pilato* | *Technical Advocate* | *Elasticsearch.com* @dadoonet https://twitter.com/dadoonet | @elasticsearchfrhttps://twitter.com/elasticsearchfr Le 13 mars 2014 à 17:19:49, ZenMaster80 (sabda...@gmail.com javascript:) a écrit: On my local machine, I do this: bin/plugin -install ... With debian installation, I am not sure where the bin/plugin' folder is? Anyone knows? -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com javascript:. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/14d64f1f-fb5d-4c7c-876c-726814229d26%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/14d64f1f-fb5d-4c7c-876c-726814229d26%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/e1068cb3-a877-42ca-b0b0-d9d503f7cb53%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Bulk Processor
I don't quite undertsand what the bulk processor is doing this, I would like someone to explain how it is upposed to work to make sure I designed this correctly. I specify the number of actions 1000. my feeder keeos pushing documents to it Its more like a loop iterating documents folders, and I push eash document to the bulk. I expected the bulk to queue things until it reaches 1000 docs, then processes the bulk? Yet, this is how it logs, thie comes from the call back functions of the bulk processor. Bulk Called: ID= 1, Actions=33, MB=5.46250 Bulk Called: ID= 2, Actions=29, MB=5.51660 Bulk Succeeded: ID= 1, took= 921 ms Bulk Called: ID= 3, Actions=12, MB=5.691812 Bulk Succeeded: ID= 2, took= 1526 ms . Bulk Called: ID= 23, Actions=8, MB=5.45294 Bulk Succeeded: ID= 23, took= 751 ms Bulk Called: ID= 24, Actions=19, MB=5.383918 Bulk Succeeded: ID= 24, took= 331 ms Bulk Called: ID= 25, Actions=22, MB=5.347542 Bulk Succeeded: ID= 25, took= 694 ms Bulk Called: ID= 26, Actions=58, MB=5.249195 Bulk Succeeded: ID= 26, took= 583 ms Bulk Called: ID= 27, Actions=89, MB=5.244396 Bulk Succeeded: ID= 27, took= 588 ms. Bulk Called: ID= 47, Actions=17, MB=5.245771 ... Bulk Succeeded: ID= 47, took= 431 ms Finished Processing the whole thing -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/3f06e4bc-eb79-4dd8-b987-1bf86c028062%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Bulk Processor question
I don't quite undertsand what the bulk processor is doing, I would like someone to explain how it is supposed to work to make sure I designed this correctly. I specify the number of actions 1000. My feeder keeps pushing documents to it Its more like a loop iterating documents folders where I push each document to the bulk. I expected the bulk to queue things until it reaches 1000 docs? Then process the bulk? Yet, this is how it logs, this comes from the call back functions of the bulk processor. Bulk Called: ID= 1, Actions=33, MB=5.46250 Bulk Called: ID= 2, Actions=29, MB=5.51660 Bulk Succeeded: ID= 1, took= 921 ms Bulk Called: ID= 3, Actions=12, MB=5.691812 Bulk Succeeded: ID= 2, took= 1526 ms . Bulk Called: ID= 23, Actions=8, MB=5.45294 Bulk Succeeded: ID= 23, took= 751 ms Bulk Called: ID= 24, Actions=19, MB=5.383918 Bulk Succeeded: ID= 24, took= 331 ms Bulk Called: ID= 25, Actions=22, MB=5.347542 Bulk Succeeded: ID= 25, took= 694 ms Bulk Called: ID= 26, Actions=58, MB=5.249195 Bulk Succeeded: ID= 26, took= 583 ms Bulk Called: ID= 27, Actions=89, MB=5.244396 Bulk Succeeded: ID= 27, took= 588 ms. Bulk Called: ID= 47, Actions=17, MB=5.245771 ... Bulk Succeeded: ID= 47, took= 431 ms Finished Processing the whole thing -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/a3131fe6-5183-495f-8658-21b276a72eb6%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Bulk Processor question
My docs vary in size. some a very small, some are pdfs like showing in the log there, how do you suggest I do this since I don't know when the docs will be small or large? On Wednesday, March 12, 2014 4:01:53 PM UTC-4, Jörg Prante wrote: BulkProcessor has two thresholds, the number of actions (as you use by setting it to 1000) or a bulk request byte volume (default 5M). What you see is the 5M limit kicking in, your docs are quite large. Jörg On Wed, Mar 12, 2014 at 8:54 PM, ZenMaster80 sabda...@gmail.comjavascript: wrote: I don't quite undertsand what the bulk processor is doing, I would like someone to explain how it is supposed to work to make sure I designed this correctly. I specify the number of actions 1000. My feeder keeps pushing documents to it Its more like a loop iterating documents folders where I push each document to the bulk. I expected the bulk to queue things until it reaches 1000 docs? Then process the bulk? Yet, this is how it logs, this comes from the call back functions of the bulk processor. Bulk Called: ID= 1, Actions=33, MB=5.46250 Bulk Called: ID= 2, Actions=29, MB=5.51660 Bulk Succeeded: ID= 1, took= 921 ms Bulk Called: ID= 3, Actions=12, MB=5.691812 Bulk Succeeded: ID= 2, took= 1526 ms . Bulk Called: ID= 23, Actions=8, MB=5.45294 Bulk Succeeded: ID= 23, took= 751 ms Bulk Called: ID= 24, Actions=19, MB=5.383918 Bulk Succeeded: ID= 24, took= 331 ms Bulk Called: ID= 25, Actions=22, MB=5.347542 Bulk Succeeded: ID= 25, took= 694 ms Bulk Called: ID= 26, Actions=58, MB=5.249195 Bulk Succeeded: ID= 26, took= 583 ms Bulk Called: ID= 27, Actions=89, MB=5.244396 Bulk Succeeded: ID= 27, took= 588 ms. Bulk Called: ID= 47, Actions=17, MB=5.245771 ... Bulk Succeeded: ID= 47, took= 431 ms Finished Processing the whole thing -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com javascript:. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/a3131fe6-5183-495f-8658-21b276a72eb6%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/a3131fe6-5183-495f-8658-21b276a72eb6%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/23cffa8e-ba3f-49f7-9b23-5b4fdd47b054%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
BulkProcessor
if I set Bulk size number of files at 5000, I feed it 5000, 5000, 5000, what happens if the #of files for instance in the last batch is 2000. How does it know that it needs to process the last 2000 ? -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/8c322041-783d-45fb-9595-38aa6a50d0eb%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: indexing binary
Binh, Thanks, With your help I think I am closer to the answer. Wih the sample mapping you provided, I should be able to provide the base 64 contents of the image file as the contents field, and the ocrtext as text field. So, when the ocr text is searched, i can return the content which is the image. With the above mapping I believe the image is saved in the _source as well as the field for highlighting purposes, Can I prevent it from being stored in _source by something like this? startObject(_source).field(enabled,no).endObject() On Thursday, February 27, 2014 8:29:25 AM UTC-5, Binh Ly wrote: You certainly can add a new field, and then just put the OCR text into that new field. So for example: Mapping: PutMappingResponse putMappingResponse = new PutMappingRequestBuilder( client.admin().indices()).setIndices(INDEX_NAME).setType(DOCUMENT_TYPE).setSource( XContentFactory.jsonBuilder().startObject() .field(DOCUMENT_TYPE).startObject() .field(properties).startObject() .field(text).startObject() .field(type, string) .endObject() .field(file).startObject() .field(store, yes) .field(type, attachment) .field(fields).startObject() .field(file).startObject() .field(store, yes) .endObject() .endObject() .endObject() .endObject() .endObject() .endObject() ).execute().actionGet(); Then put the OCR text into the text field: IndexResponse indexResponse = client.prepareIndex(INDEX_NAME, DOCUMENT_TYPE, 1) .setSource(XContentFactory.jsonBuilder().startObject() .field(text, ocrText) .field(file).startObject() .field(content, fileContents) .field(_indexed_chars, -1) .endObject() .endObject() ).execute().actionGet(); You probably don't need to index the image binary information - not sure what you would need it for. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/a7db1379-5161-4f7d-ab78-a683c8beb07d%40googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.
Re: indexing binary
Sorry for the confusion - I do want PDFs, but I am concerned with the retrieval of the image file when it ocr text is searched. I must be missing something. As showing below, I provide two fields text and the content. In your second post you say I don't need the content' field for images? So, how does the search return the image to the asking client Web app for instance when a text match occurs with the image ocr text? If I only include text, then it will return the text part of the image only and not the image, correct? source(XContentFactory.jsonBuilder() .startObject() .field(text,ocrText)//extracted ocr text from image .field( file).startObject() .field(content, fileContents) //content is the encoded base64string of the image file? is it needed? .field(_indexed_chars, -1) .endObject() .endObject() On Thursday, February 27, 2014 1:16:36 PM UTC-5, Binh Ly wrote: Oh, the attachment part is for your PDF. If you don't need to index PDFs then just remove that part: PutMappingResponse putMappingResponse = new PutMappingRequestBuilder( client.admin().indices()). setIndices(INDEX_NAME).setType(DOCUMENT_TYPE).setSource( XContentFactory.jsonBuilder().startObject() .field(DOCUMENT_TYPE).startObject() .field(properties).startObject() .field(text).startObject() .field(type, string) .endObject() .endObject() .endObject() .endObject() ).execute().actionGet(); Indexing: IndexResponse indexResponse = client.prepareIndex(INDEX_ NAME, DOCUMENT_TYPE, 1) .setSource(XContentFactory.jsonBuilder().startObject() .field(text, ocrText) .endObject() ).execute().actionGet(); -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/35b9a36f-0a4e-4973-8c03-8d35f0af1a9f%40googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.
indexing binary
I index PDFs using apache with the following mapping. .field( type, attachment ) .field(fields) .startObject() .startObject(file) .field(store, yes) .endObject() I want to index photos, I am able to extract text using OCR. I am confused how to index the text though, do I treat it like any document and not as an attachment? I have text as String when extracted and not base 64 like in the case of pdfs? I am confused to how it gets stored and how does it work if I need to make it available during search? Can someone explain on how I do this? XContentFactory.jsonBuilder().startObject() .startObject(INDEX_TYPE) .startObject(_source).field(enabled,no).endObject() //This line will not store/not store the base 64 whole _source .startObject(properties) So, My photo object becomes something like this, what about the source (the image itself ?) jsonObject { content:text extracted from image name:my_photo.png } //add to the bulk indexer for indexing bulkProcessor.add(Requests.indexRequest(INDEX_NAME).type(INDEX_TYPE).id( jsonObject.getString(name)).source(jsonObject.toString())); -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/2012d7c6-b499-4318-8ae7-512879e5e8b8%40googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.
Re: TransportSerializationException: Failed to deserialize exception response from stream
I ran into same problem, version was correct, plugins installed, In my case port 9300 was not opened for transportclient, once I opened it, it worked fine. On Thursday, February 20, 2014 9:06:42 AM UTC-5, Tiago Rodrigues wrote: I get this error sometimes when I try to create an index. My version of java in easticsearch is the same as the client server. Not is always this error occurs, different than seen in other posts. The log: Exception in thread main org.elasticsearch.transport.TransportSerializationException: Failed to deserialize exception response from stream at org.elasticsearch.transport.netty.MessageChannelHandler.handlerResponseError(MessageChannelHandler.java:169) at org.elasticsearch.transport.netty.MessageChannelHandler.messageReceived(MessageChannelHandler.java:123) at org.elasticsearch.common.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70) at org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564) at org.elasticsearch.common.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791) at org.elasticsearch.common.netty.channel.Channels.fireMessageReceived(Channels.java:296) at org.elasticsearch.common.netty.handler.codec.frame.FrameDecoder.unfoldAndFireMessageReceived(FrameDecoder.java:462) at org.elasticsearch.common.netty.handler.codec.frame.FrameDecoder.callDecode(FrameDecoder.java:443) at org.elasticsearch.common.netty.handler.codec.frame.FrameDecoder.messageReceived(FrameDecoder.java:303) at org.elasticsearch.common.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70) at org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564) at org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:559) at org.elasticsearch.common.netty.channel.Channels.fireMessageReceived(Channels.java:268) at org.elasticsearch.common.netty.channel.Channels.fireMessageReceived(Channels.java:255) at org.elasticsearch.common.netty.channel.socket.nio.NioWorker.read(NioWorker.java:88) at org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:109) at org.elasticsearch.common.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:312) at org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:90) at org.elasticsearch.common.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178) at org.elasticsearch.common.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108) at org.elasticsearch.common.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) Caused by: java.io.EOFException at java.io.ObjectInputStream$BlockDataInputStream.peekByte(ObjectInputStream.java:2598) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1318) at java.io.ObjectInputStream.access$300(ObjectInputStream.java:206) at java.io.ObjectInputStream$GetFieldImpl.readFields(ObjectInputStream.java:2153) at java.io.ObjectInputStream.readFields(ObjectInputStream.java:540) at java.net.InetSocketAddress.readObject(InetSocketAddress.java:282) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1017) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1893) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) at
Indexing Images
I am a bit confused about this topic, I would like to index images (png,jpegs, gifs...), my understanding is that I need to extract and index text portions from images, I don't really care for the meta data. So, I looked online and decided to use apache Tika which I also use to extract text and index pdfs (pdfs work fine). - How do I get the text part of images? All I am able to extract is metadata which I don't need. - Ideally I want to say if this image has no text to extract, then discard/ignore? Can you please clarify this topic a bit more and provide any samples if available? Additionaly, I don't want to store the 64 based encoded document. PutMappingResponse putMappingResponse = new PutMappingRequestBuilder( client.admin().indices() ).setIndices( INDEX_NAME).setType(INDEX_TYPE).setSource( XContentFactory.jsonBuilder ().startObject() .startObject(INDEX_TYPE) .startObject(_source).field( enabled,no).endObject() //I believe this line will not store the base 64 whole _source, below I store the text portion of file only file .startObject(properties) .startObject(file) .field( term_vector, with_positions_offsets ) .field( store, no ) .field( type, attachment ) .field(fields) .startObject() .startObject(file) .field(store, yes) .endObject() .endObject() .endObject() .endObject() .endObject() .endObject() ).execute().actionGet(); public static void testImage(File file) throws IOException, SAXException,TikaException { Tika tika = new Tika(); InputStream inputStream = new BufferedInputStream( new FileInputStream(file)); Metadata metadata = new Metadata(); ContentHandler handler = new DefaultHandler(); Parser parser = new JpegParser(); ParseContext context = new ParseContext(); String mimeType = tika.detect(inputStream); metadata.set(Metadata.CONTENT_TYPE, mimeType); parser.parse(inputStream,handler,metadata,context); for(int i = 0; i metadata.names().length; i++) { //metaData -I don't care for this String name = metadata.names()[i]; System.out.println(name + : + metadata.get(name)); } } -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/dbfe132a-c25b-40f0-93a7-7957cf978004%40googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.
Re: Indexing Images
Thanks David. I agree that OCR and maybe any kind of text extraction should be done pre-Elastic Search indexing. But, I am just wondering if apache tika supports this, or if anyone has experience with using a certain tool. I do plan to do extract before indexing. On Thursday, February 20, 2014 11:38:31 AM UTC-5, ZenMaster80 wrote: I am a bit confused about this topic, I would like to index images (png,jpegs, gifs...), my understanding is that I need to extract and index text portions from images, I don't really care for the meta data. So, I looked online and decided to use apache Tika which I also use to extract text and index pdfs (pdfs work fine). - How do I get the text part of images? All I am able to extract is metadata which I don't need. - Ideally I want to say if this image has no text to extract, then discard/ignore? Can you please clarify this topic a bit more and provide any samples if available? Additionaly, I don't want to store the 64 based encoded document. PutMappingResponse putMappingResponse = new PutMappingRequestBuilder( client.admin().indices() ).setIndices( INDEX_NAME).setType(INDEX_TYPE).setSource( XContentFactory.jsonBuilder ().startObject() .startObject(INDEX_TYPE) .startObject(_source).field( enabled,no).endObject() //I believe this line will not store the base 64 whole _source, below I store the text portion of file only file .startObject(properties) .startObject(file) .field( term_vector, with_positions_offsets ) .field( store, no ) .field( type, attachment ) .field(fields) .startObject() .startObject(file) .field(store, yes) .endObject() .endObject() .endObject() .endObject() .endObject() .endObject() ).execute().actionGet(); public static void testImage(File file) throws IOException, SAXException,TikaException { Tika tika = new Tika(); InputStream inputStream = new BufferedInputStream( new FileInputStream(file)); Metadata metadata = new Metadata(); ContentHandler handler = new DefaultHandler(); Parser parser = new JpegParser(); ParseContext context = new ParseContext(); String mimeType = tika.detect(inputStream); metadata.set(Metadata.CONTENT_TYPE, mimeType); parser.parse(inputStream,handler,metadata,context); for(int i = 0; i metadata.names().length; i++) { //metaData -I don't care for this String name = metadata.names()[i]; System.out.println(name + : + metadata.get(name)); } } -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/fac820d6-5343-4820-8acc-7e20c5663984%40googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.
Re: Searching PDF
So, What's wrong with this? GET localhost:9200/_search { fields: file, query: { match_all: {} } } .. hits: { total: 1, max_score: 1, hits: [ { _index: docs, _type: pdf, _id: 1, _score: 1, fields: { file: JVBERi0xLjQNJeLjz9MNCjE1OCAwIG9iaiA8PC9MaW5lYXJpemVkIDEvTCAzODExNDQvTyAxNjMvRSAyNDcxMS9OIDEzL1QgMzc3OTM2L0ggWyAxMTU2IDQ2OF0+Pg1lbmRvYmoNICAgICAgICAgICAgDQp4cmVmDQoxNTggNDMNCjAwMDAwMDAwMTYgMDAwMDAgbg0KMDAwMDAwMTYyNCAwMDAwMCBuDQowMDAwMDAxNzk0IDAwMDAwIG4NCjAwMDAwMDE4MjAgMDAwMDAgbg0KMDAwMDAwMTg2NiAwMDAwMCBuDQowMDAwMDAxOTAwIDAwMDAwIG4NCjAwMDAwMDIxMDkgMDAwMDAgbg0KMDAwMDAwMjE4OSAwMDAwMCBuDQowMDAwMDAyMjY3IDAwMDAwIG4NCjAwMDAwMDIzNDQgMDAwMDAgbg0KMDAwMDAwMjQyMSAwMDAwMCBuDQowMDAwMDAyNDk4IDAwMDAwIG4NCjAwMDAwMDI1NzUgMDAwMDAgbg0KMDAwMDAwMjY1MiAwMDAwMCBuDQowMDAwMDAyNzI5IDAwMDAwIG4NCjAwMDAwMDI4MDYgMDAwMDAgbg0KMDAwMDAwMjg4MyAwMDAwMCBuDQowMDAwMDAyOTYwIDAwMDAwIG4NCjAwMDAwMDMwMzYgMDAwMDAgbg0KMDAwMDAwMzE5OCAwMDAwMCBuDQowMDAwMDAzNjMwIDAwMDAwIG4NCjAwMDAwMDM2NjYgMDAwMDAgbg0KMDAwMDAwMzkwMCAwMDAwMCBuDQowMDAwMDAzOTc3IDAwMDAwIG4NCjAwMDAwMDQwNTMgMDAwMDAgbg0KMDAwMDAwNDkxMSAwMDAwMCBuDQowMDAwMDA1NzA5IDAwMDAwIG4NCjAwMDAwMD On Friday, February 7, 2014 4:48:46 PM UTC-5, Binh Ly wrote: You should be able to get the textual field values by explicitly requesting them from fields. For example: GET localhost:9200/_search { fields: *, query: { match_all: {} } } -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/830dd808-d996-4ff5-bbc9-aaca1d5acd3a%40googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.
Re: Searching PDF
You are correct, my JSON mapping had a wrong entry. Thanks for the help! On Friday, February 7, 2014 6:10:50 PM UTC-5, Binh Ly wrote: It looks like that indexing code might not be correct. I just tried this code and it works for me: try { String fileContents = readContent( new File( fn6742.pdf ) ); try { DeleteIndexResponse deleteIndexResponse = new DeleteIndexRequestBuilder( client.admin().indices(), INDEX_NAME ).execute().actionGet(); if (deleteIndexResponse.isAcknowledged() ) { System.out.println( Deleted index ); } } catch (Exception e) { //ignore } CreateIndexResponse createIndexResponse = new CreateIndexRequestBuilder( client.admin().indices(), INDEX_NAME ).execute().actionGet(); if ( createIndexResponse.isAcknowledged() ) { System.out.println( Created index ); } PutMappingResponse putMappingResponse = new PutMappingRequestBuilder( client.admin().indices() ).setIndices(INDEX_NAME).setType( DOCUMENT_TYPE ).setSource( XContentFactory.jsonBuilder().startObject() .field(doc).startObject() .field( properties ).startObject() .field( file ).startObject() .field( term_vector, with_positions_offsets ) .field( store, yes ) .field( type, attachment ) .field(fields).startObject() .field(file).startObject() .field(store, yes) .endObject() .endObject() .endObject() .endObject() .endObject() .endObject() ).execute().actionGet(); if ( putMappingResponse.isAcknowledged() ) { System.out.println( Successfully defined mapping ); } IndexResponse indexResponse = client.prepareIndex( INDEX_NAME , DOCUMENT_TYPE, 1) .setSource(XContentFactory.jsonBuilder() .startObject() .field( file).startObject() .field(content, fileContents) .field(_indexed_chars, -1) .endObject() .endObject() ).execute().actionGet(); System.out.println( Document indexed success: + indexResponse.isCreated() ); } catch ( Exception e ) { System.out.println(e.toString()); } And then when I query: { fields: *, query: { match_all: {} } } I get back this: { took : 2, timed_out : false, _shards : { total : 5, successful : 5, failed : 0 }, hits : { total : 1, max_score : 1.0, hits : [ { _index : msdocs, _type : doc, _id : 1, _score : 1.0, fields : { file : [ \n1\nISL99201\nCAUTION: These devices are sensitive to electrostatic discharge; follow proper IC Handling Procedures.\n1-888-INTERSIL or 1-888-468-3774] } } ] } } -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/3229ca87-1594-460d-b43a-a802c6a57a74%40googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.
searching while indexing
I am unclear on how does searching work while indexing. lets say I already have a document indexed (version 1), and I updated the document, so I will index it again (version 2), what happens when the user is searching while indexing version 2? Will the user get results from version 1? -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/0fb1e6fe-0d96-4d35-b155-f9de87e5c1f7%40googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.
Re: Improving Bulk Indexing
Good to know, I will keep this in mind, even though I will try to go for SSD as I personally had great success with them in the past! When you say 10-12 MB/sec, is this with doc parsing/processing or just ES index time. For my humble test on a quadcore labtop, I am pushing 6 MB/sec with processing and 9 MB/sec if I don't include processing time. I tried playing with many different settings, I think this is about all its going to do giving the machine I am running on. On Tuesday, February 4, 2014 4:22:10 PM UTC-5, Jörg Prante wrote: My use case is bibliographic data indexing of academic and public libraries. There are ~100m records from various sources that I regularly extract, transform into JSON-LD, and load into Elasticsearch. Some are files, some are fetched by JDBC. I have six 32-core servers in our place, organized in 2 ES clusters. Self installed and configured - no cloud VMs :) With bulk indexing I can push around 10-12m/sec to an ES cluster. Transforming docs is rather complex, needs re-processing of indexed data. The job is done in a few hours so I can perform ETL every night. No SSD, too expensive, but SAS-2 (6Gbit/sec) RAID-0 drives of ~1TB per server. Jörg On Tue, Feb 4, 2014 at 5:22 PM, ZenMaster80 sabda...@gmail.comjavascript: wrote: Jörg, Great, I learned a lot about the process from your responses. Could you elaborate more on your use case, mine I think will be similar to yours where processing/feeding is on one server and I will use transport client, index nodes will be on EC2. So, when I do get to setting up Ec2 nodes, I believe I should be mostly looking for big cores and SSD. For current test, besides running long feeds to guage performance and checking for analyzers, I take it there isn't much else I can do to make significant impact? On Tuesday, February 4, 2014 3:11:14 AM UTC-5, Jörg Prante wrote: SSD will improve overall performance very much, yes. Disk drives are the slowest part in the chain and this will help. No more low IOPS, so it will significantly reduce the load on CPU (less IO waits). More RAM will not help that much. In fact, more RAM will slow down persisting, it increases pressure on the memory-to-disk part. ES obviously does not depend on large RAM for persisting data, some MB suffice, but you can try and see for yourself. 85 MB is not sufficient for testing index segment merging and GC effects, you should run a bulk indexing feed not for seconds, but for at least 20-30 minutes, if not for hours. Also check if your mapping can be simplified, the less complex analyzers, the faster ES can index. You should also exercise your feed program how long it takes to process your input without the part of bulk indexing. Then you see a bottom line, and maybe more space for improvement outside ES. In my use case, it helped to move the feed program to another server and use the TransportClient with a speedup of ~30%. I agree that 5.5M/sec is not the end of the line but that heavily depends on your hard- and software configuration (machine, OS, file systems, JVM). Jörg -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com javascript:. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/8db08c83-c91d-45df-bd28-5fe49f7f32cd%40googlegroups.com . For more options, visit https://groups.google.com/groups/opt_out. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/e2f2b04d-8b43-4641-a31a-adadfff037e6%40googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.
Re: Improving Bulk Indexing
Jörg, Just so I understand this, if I were to index 100 MB worth of data total with chunk volumes of 5 MB each, this means I have to index 20 times.If I were to set the bulk size to 20 MB, I will have to index 5 times. This is a small data size, picture I have millions of documents. Are you saying the first method is better because of GC operations would be faster? Thanks again On Monday, February 3, 2014 9:47:46 AM UTC-5, Jörg Prante wrote: Note, bulk operates just on network transport level, not on index level (there are no transactions or chunks). Bulk saves network roundtrips, while the execution of index operations is essentially the same as if you transferred the operations one by one. To change refresh interval to -1, use an update settings request like this: http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/indices-update-settings.html ImmutableSettings.Builder settingsBuilder = ImmutableSettings.settingsBuilder(); settingsBuilder.put(refresh_interval, -1)); UpdateSettingsRequest updateSettingsRequest = new UpdateSettingsRequest(myIndexName) .settings(settingsBuilder); client.admin().indices() .updateSettings(updateSettingsRequest) .actionGet(); Jörg -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/531710e5-e42a-4ed1-a1e0-ad5d48e14146%40googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.
Re: Improving Bulk Indexing
Thanks again for clarifying this, I think I understand this, what I was referring to in my prior posts was the difference between setting 1000 documents vs 1 documents, I was thinking the bigger the chunk volume will produce less over the wire index requests, but I understand your reasoning behind thrashing and slow GC. The numbers below kind of support my theory, as I increased the chunk to 10 MB or 10,000 docs, I saw a slight improvement in total indexing time (I think). I would like to get your/others feedback on some numbers/benchmarks, I tested with bulkrequest and with bulkprocessor, both similar results (I seem to think it is slow?) - Same source for testing (85 MB) - Running one node/1 shard/ 0 replica on local mac book 8 cores, 4G RAM - Used Bulk batch size 1MB concurrentRequests = 1, I indexed 85 MB in ~17 seconds. - Used Bulk batch size 1MB concurrentRequests = 8, I indexed 85 MB in ~15 seconds. - Used Bulk batch size 5MB concurrentRequests = 1, I indexed 85 MB in ~15 seconds. - Used Bulk batch size 5MB concurrentRequests = 8, I indexed 85 MB in ~17 seconds. - Used Bulk batch size 10MB concurrentRequests = 1, I indexed 85 MB in ~13 seconds. - Used Bulk batch size 10MB concurrentRequests = 8, I indexed 85 MB in ~13 seconds. - Using number of docs -- - Used Bulk 1000 docs concurrentRequests = 1, I indexed 85 MB in ~15 seconds. - Used Bulk 1000 docs concurrentRequests = 8, I indexed 85 MB in ~13 seconds. - Used Bulk 1 docs concurrentRequests = 1, I indexed 85 MB in ~15 seconds. - Used Bulk 1 docs concurrentRequests = 8, I indexed 85 MB in ~12/~13 seconds. Ok, So an average of 15 sec for 85MB, 5.5 MB/sec. Why do I think this is slow. I am not sure if I am doing the right math, but for 20 million docs (27 TB data), this will take 2 days? I understand with better machines like SSD and more RAM I will get better results. However, I would like to optimize what I have now to the fullest before scaling up. What other configurations can I tweak to improve for my current test? .put(client.transport.sniff, true) .put(refresh_interval, -1) .put(number_of_shards, 1) .put(number_of_replicas, 0) On Monday, February 3, 2014 2:02:32 PM UTC-5, Jörg Prante wrote: Not sure if I understand. If I had to index a pile of documents, say 15M, I would build bulk request of 1000 documents, where each doc is in avg ~1K so I end up at ~1MB. I would not care about different doc size as they equal out over the total amountThen I send this bulk request over the wire. With a threaded bulk feeder, I can control concurrent bulk requests of up to the number of CPU cores, say 32 cores. Then repeat. In total, I send 15K bulk requests. The effect is that on the ES cluster, each bulk request of 1M size allocates only few resources on the heap and the bulk request can be processed fast. If the cluster is slow, the client sees the ongoing bulk requests piling up before bulk responses are returned, and can control bulk capacity against a maximum concurrency limit. If the cluster is fast, the client receives responses almost instantly, and the client can decide if it is more appropriate to increase bulk request size or concurrency. Does it make sense? Jörg On Mon, Feb 3, 2014 at 5:06 PM, ZenMaster80 sabda...@gmail.comjavascript: wrote: Jörg, Just so I understand this, if I were to index 100 MB worth of data total with chunk volumes of 5 MB each, this means I have to index 20 times.If I were to set the bulk size to 20 MB, I will have to index 5 times. This is a small data size, picture I have millions of documents. Are you saying the first method is better because of GC operations would be faster? Thanks again On Monday, February 3, 2014 9:47:46 AM UTC-5, Jörg Prante wrote: Note, bulk operates just on network transport level, not on index level (there are no transactions or chunks). Bulk saves network roundtrips, while the execution of index operations is essentially the same as if you transferred the operations one by one. To change refresh interval to -1, use an update settings request like this: http://www.elasticsearch.org/guide/en/elasticsearch/ reference/current/indices-update-settings.html ImmutableSettings.Builder settingsBuilder = ImmutableSettings. settingsBuilder(); settingsBuilder.put(refresh_interval, -1)); UpdateSettingsRequest updateSettingsRequest = new UpdateSettingsRequest(myIndexName) .settings(settingsBuilder); client.admin().indices() .updateSettings(updateSettingsRequest) .actionGet(); Jörg -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com javascript:. To view
Loading JSON to ElasticSearch
I would like to get your perspective on how to load json to index server in my scenario. We have about 15 million documents in html/pdf/... on Server 1 I would like to process the data and convert to json on server 2 I would like the indexer to index json n a separate machine/server server 3 Ideally I thought on Server 2, as I prepare json and have it ready in memory, I can feed it to indexer. But since data processing is cpu intensive, I want indexing to be done on a separate machines/server. How do you guys deal with this since I can no longer feed in-memory json to the indexer on separate machine? Do I just grab files from server 2 and index them then? -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/05b977ac-00d0-45c0-9e58-8df523e6978c%40googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.
Re: Loading JSON to ElasticSearch
Thanks David, I will certainly look into hashtag. Do you think it is a good idea to separate data analysis and indexing into 2 different machines since both require lots of cpu time. If I use hashtag to send files over to ES, will I be able to use native Java API or http, and is there any preference to the API? I have noticed there are somethings that aren't very easy and may be don't even work in the native API? Thanks again. On Tuesday, January 28, 2014 1:05:32 PM UTC-5, David Pilato wrote: Did you try https://github.com/dadoonet/fsriver? Never tested it with so many docs but may be it could help you here? If you have already generated json files on a server, then I would recommend trying logstash to send them into elasticsearch. My 2 cents -- *David Pilato* | *Technical Advocate* | *Elasticsearch.com* @dadoonet https://twitter.com/dadoonet | @elasticsearchfrhttps://twitter.com/elasticsearchfr Le 28 janvier 2014 at 16:46:06, ZenMaster80 (sabda...@gmail.comjavascript:) a écrit: I would like to get your perspective on how to load json to index server in my scenario. We have about 15 million documents in html/pdf/... on Server 1 I would like to process the data and convert to json on server 2 I would like the indexer to index json n a separate machine/server server 3 Ideally I thought on Server 2, as I prepare json and have it ready in memory, I can feed it to indexer. But since data processing is cpu intensive, I want indexing to be done on a separate machines/server. How do you guys deal with this since I can no longer feed in-memory json to the indexer on separate machine? Do I just grab files from server 2 and index them then? -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com javascript:. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/05b977ac-00d0-45c0-9e58-8df523e6978c%40googlegroups.com . For more options, visit https://groups.google.com/groups/opt_out. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/a02427ec-a3d8-484f-9cfb-2ba7628192b1%40googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.
Re: Loading JSON to ElasticSearch
Thanks David, I will certainly look into logstash. Do you think it is a good idea to separate data analysis and indexing into 2 different machines since both require lots of cpu time. If I use logstash to send files over to ES, will I be able to use native Java API or http, and is there any preference to the API? I have noticed there are somethings that aren't very easy and may be don't even work in the native API? Thanks again On Tuesday, January 28, 2014 1:05:32 PM UTC-5, David Pilato wrote: Did you try https://github.com/dadoonet/fsriver? Never tested it with so many docs but may be it could help you here? If you have already generated json files on a server, then I would recommend trying logstash to send them into elasticsearch. My 2 cents -- *David Pilato* | *Technical Advocate* | *Elasticsearch.com* @dadoonet https://twitter.com/dadoonet | @elasticsearchfrhttps://twitter.com/elasticsearchfr Le 28 janvier 2014 at 16:46:06, ZenMaster80 (sabda...@gmail.comjavascript:) a écrit: I would like to get your perspective on how to load json to index server in my scenario. We have about 15 million documents in html/pdf/... on Server 1 I would like to process the data and convert to json on server 2 I would like the indexer to index json n a separate machine/server server 3 Ideally I thought on Server 2, as I prepare json and have it ready in memory, I can feed it to indexer. But since data processing is cpu intensive, I want indexing to be done on a separate machines/server. How do you guys deal with this since I can no longer feed in-memory json to the indexer on separate machine? Do I just grab files from server 2 and index them then? -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com javascript:. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/05b977ac-00d0-45c0-9e58-8df523e6978c%40googlegroups.com . For more options, visit https://groups.google.com/groups/opt_out. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/f536d58c-89ab-4609-b5ca-cef44e2b879a%40googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.
Native client or REST
I thought I understood this, but maybe not. I hope someone can shed some light on this. I have to index tons of files and I would like to be able to query it from our web application written in javascript, all will be running on AWS EC2. Question: If I index the files using native JAVA API, will I be able to perform queries/searches from the web application via http/REST API? I am curious to know how people approach this? Note, I would prefer to work with Java, but willing to do something else if it makes more sense. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/58e6edbc-09b5-4375-a511-77b4c0b1a15b%40googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.
TransportClient not connecting
I can't seem to figure out this problem, Node from NodeBuilder works, but If I use transportclient like below, I get an exception. //I am using all default settings //elasticsearch-0.90.9 Settings settings = ImmutableSettings.settingsBuilder().put(cluster.name, elasticsearch).build(); InetSocketTransportAddress(localhost, 9300)); Exception in thread main org.elasticsearch.client.transport. NoNodeAvailableException: No node available at org.elasticsearch.client.transport.TransportClientNodesService.execute( TransportClientNodesService.java:213) at org.elasticsearch.client.transport.support.InternalTransportClient. execute(InternalTransportClient.java:106) at org.elasticsearch.client.support.AbstractClient.bulk(AbstractClient.java :149) at org.elasticsearch.client.transport.TransportClient.bulk(TransportClient. java:346) at org.elasticsearch.action.bulk.BulkRequestBuilder.doExecute( BulkRequestBuilder.java:165) at org.elasticsearch.action.ActionRequestBuilder.execute( ActionRequestBuilder.java:85) at org.elasticsearch.action.ActionRequestBuilder.execute( ActionRequestBuilder.java:59) at EntryPoint.createBulkIndexes(EntryPoint.java:305) at EntryPoint.main(EntryPoint.java:147) -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/6f8941ac-11df-4f8d-9670-cc609c7aca6e%40googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.
Re: TransportClient not connecting
Anyone using transportclient from java? On Wednesday, January 22, 2014 12:04:30 PM UTC-5, ZenMaster80 wrote: I can't seem to figure out this problem, Node from NodeBuilder works, but If I use transportclient like below, I get an exception. //I am using all default settings //elasticsearch-0.90.9 Settings settings = ImmutableSettings.settingsBuilder().put(cluster.name , elasticsearch).build(); InetSocketTransportAddress(localhost, 9300)); Exception in thread main org.elasticsearch.client.transport. NoNodeAvailableException: No node available at org.elasticsearch.client.transport.TransportClientNodesService.execute (TransportClientNodesService.java:213) at org.elasticsearch.client.transport.support.InternalTransportClient. execute(InternalTransportClient.java:106) at org.elasticsearch.client.support.AbstractClient.bulk(AbstractClient. java:149) at org.elasticsearch.client.transport.TransportClient.bulk( TransportClient.java:346) at org.elasticsearch.action.bulk.BulkRequestBuilder.doExecute( BulkRequestBuilder.java:165) at org.elasticsearch.action.ActionRequestBuilder.execute( ActionRequestBuilder.java:85) at org.elasticsearch.action.ActionRequestBuilder.execute( ActionRequestBuilder.java:59) at EntryPoint.createBulkIndexes(EntryPoint.java:305) at EntryPoint.main(EntryPoint.java:147) -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/4731270f-60ca-4d41-9056-9179c23d6a53%40googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.
Re: TransportClient not connecting
Brian, This is no different from what I have. I googled the problem, and I guess this may come from the fact that ES js using a different java version. I have added the es 0.90.0.jar to java from the es installation folder. I have no clue what I am missing. On Wednesday, January 22, 2014 2:02:57 PM UTC-5, InquiringMind wrote: ImmutableSettings.Builder settingsBuilder = ImmutableSettings.settingsBuilder(); settingsBuilder.put(cluster.name, clusterName); TransportClient client = new TransportClient(settingsBuilder.build()); for (String host : hostNames) { InetSocketTransportAddress server_address = new InetSocketTransportAddress( host, portTransport); client.addTransportAddress(server_address); } Brian -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/ebaf9d9d-808a-4fc3-a94f-4f7c5564c1dd%40googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.
Re: TransportClient not connecting
java version 1.7.0_11 Java(TM) SE Runtime Environment (build 1.7.0_11-b21) Java HotSpot(TM) 64-Bit Server VM (build 23.6-b04, mixed mode) I spent too much time on this, I gave up. I'll ask the question differently, I wanted to use the transport client at 9300 so I can index a file, and the intent to search it with http://localhost:9300/_search...for demo purposes since I didn't want to search it using java code. I am able to index the file with nodeBuilder, is there a way I can query it using http? My understanding is that the node is local, can I query it somehow over http? On Wednesday, January 22, 2014 10:02:14 PM UTC-5, Ross Simpson wrote: java -version will tell you the exact version, patch level, vendor, and architecture of that JVM. The tricky bit can be finding out which JVM you're actually using (usually the value in $JAVA_HOME or `which java`will lead you in the right direction). If you're running your example under an IDE, it might well be using a different JVM than the ES server. I've had similar troubles to what you describe, but not on the localhost. Do you get the exception right away, or some time after starting up your client? Ross On Thursday, 23 January 2014 10:34:40 UTC+11, ZenMaster80 wrote: Yes, I do have 0.90.9 across the board. I know 9300 is opened. I am not sure how to check if both are using same JVM? es.yml is default, default clustername, nodename .. I only have the default (1 Node)... Do I need to specify unicast instead of the default which I believe uses multiCast? On Wednesday, January 22, 2014 3:25:26 PM UTC-5, Jörg Prante wrote: You wrote that you have a 0.90.9 cluster but you added 0.90.0 jars to the client. Is that correct? Please check: - if your cluster nodes and client node is using exactly the same JVM - if your cluster and client use exactly the same ES version - if your cluster and client use the same cluster name - reasons outside ES: IP blocking, network reachability, network interfaces, IPv4/IPv6 etc. Then you should be able to connect with TransportClient. Jörg -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/15445d05-937e-45ce-800b-d57f82bf2af9%40googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.
Return specific field and highlights via Java API
I am having two issues using the java api 1. I am not able to return specific field in my search query - It shows I have the right number of results, but displays Null 2. Not return highlights Note: Assume Indexing is fine, because I am able to get correct results if comment out the line .AddField(hid) I am using default everything, I understand for highlights _source for field has to be enabled, but I thought if not, it grabs the original source. json: {uid:'123, name:hello}, {uid:'1234, name:hello1} node = NodeBuilder.nodeBuilder() // .local(true)// .data(true) // .node(); client = node.client(); //..createIndex private void search(String index, String type,String field, String value) { SearchResponse response = client.prepareSearch(index) .setTypes(type) .addHighlightedField(uid) .addField(uid) SearchHit[] results = response.getHits().getHits(); System.out.println(Current results: + results.length); for (SearchHit hit : results) { System.out.println(--); MapString,Object result = hit.getSource(); System.out.println(result); } } -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/4984505f-9946-4855-8bf0-5dd11b0a1b21%40googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.
Re: Indexing PDF and other binary formats
Thanks for the reply. the attachment plugin I understand encodes content before indexing it, this sounds like an expensive operation if we have lots of pdfs. I was thinking extracting text from pdf early on instead and deal with text instead. Does the plugin also work for binaries like images? On Thursday, January 16, 2014 4:12:47 PM UTC-5, David Pilato wrote: You can use Tika by yourself (recommended). See how I did it in fsriver project. You can use mapper attachment plugin which is using Tika behind the scene but gives you less control IMHO. About versions, elasticsearch does not keep old versions around. If you need that, you have to manage it yourself. HTH -- David ;-) Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs Le 16 janv. 2014 à 20:42, ZenMaster80 sabda...@gmail.com javascript: a écrit : - Is there any literature on how to index pdf documents and binary formats like images? - Versioning question: If I update an already indexed document, I believe ES will update the version number. I am wondering if it keeps the previous document, what if I needed access to the previous document? -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com javascript:. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/a9e8f331-c4bd-4a4c-be5a-b91e4f2f0e26%40googlegroups.com . For more options, visit https://groups.google.com/groups/opt_out. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/94b706cf-c4de-4f94-87b7-48c9e6e814b0%40googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.
Re: How to query Elastic Search from my web app?
Great, I also found this helpful, by simple making ajax calls. http://www.elasticsearch.org/tutorials/javascript-web-applications-and-elasticsearch/ On Thursday, January 16, 2014 1:00:44 AM UTC-5, David Pilato wrote: This? http://www.elasticsearch.org/blog/client-for-node-js-and-the-browser/ -- David ;-) Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs Le 16 janv. 2014 à 06:18, ZenMaster80 sabda...@gmail.com javascript: a écrit : I am not very clear on how to do this, I have the following scenario: My data/docs are indexed using scala native Java API. - I would like to use the REST http API to access ES, What I would like to understand is how can I query ES server from my web application written in Java Script, are there any existing APIs that I can use with Java Script? I understand that I can't use curl for instance with javaScript. - What's the best approach to this in order to make the solution/code maintainable and scalable? -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com javascript:. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/a8b5a3df-9fc8-4e9d-82f1-98b4a4d1c57a%40googlegroups.com . For more options, visit https://groups.google.com/groups/opt_out. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/c18d7ad5-9e11-4c92-abb7-b15c63360a95%40googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.
How to approach Indexing for a newbie?
I have a project that used an old search engine and I would like to move things to ElasticSearch. I have been doing some reading, and I wanted some perspective on how to approach the problem. - I have bundles(folders) of text/html/pdf/img documents, each folder has an average of 50-100 documents, document is about 100K in Size. - The number of folders and documents can increase and decrease, mostly increase but very slightly. I understand that txt/html will need to be turned into JSON now, and somehow I will have to create an index and add these documents to the index for indexing. I have some questions that I don't fully understand still. 1- How do I know how many indices do I need? 2- How do I know how many shards to allocate when creating the index? 3- How do I know how many nodes needed, and how do I make things scale up and down? Is there a way to idle things when no indexing is happening? 4- How do I add documents to the index for indexing? I always see example with JSON snippets, but in reality I have something like folder1{doc1,doc2,..doc100}, folder2{docA...docN} ... 5- This is probably a dumb question...Is there a preferable language to use for the indexing calls? If I were to build an app to call the REST API, which language I need to use to do this if at all? Thanks again for the help. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/39e218f3-395c-44b9-bac1-cc2994e26391%40googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.
Re: How to approach Indexing for a newbie?
Wow, this is exactly what I was looking for. I am a bit curious on #5, I am assuming there is a Java API to access ES, is there any link on how to get started using Java with ES? I would like to know how to import ES framework/API into java project. Thanks again, this is a great clarification! On Tuesday, January 14, 2014 4:17:31 PM UTC-5, Jörg Prante wrote: 1. Mostly, indexes are result of a partition design outside ES. For example, by time, user, data origin. The beauty of ES is that it can host as many indexes as you wish. 2. If your maximum number of nodes (hosts) you want to spend to ES is known, use that node number for the number of shards. So you make sure your cluster can scale. If the number is not known, try to estimate the total number of documents to get indexed, the total volume of that documents, and an estimated index volume per shard. Rule of thumb: a shard should be sized so it can fit into the Java heap and so that it can be moved between nodes in reasonable time (~1-10 GB). 3. You can scale up by adding nodes - just start ES on another host. Scale down is also easy, stop ES on a node. 4. You have to write a program that traverses your folders, picks up each document, and extracts fields from the document to get them indexed. With scrutmydocs.org you can experiment how this works by using such a file traverser which is already prepared to handle quite a lot of file types automatically. 5. You should consider using one of the standard clients. As ES supports HTTP REST, and the standard clients are designed to support a comparable set of features, it does not matter what language you use. Just pick your favorite language. (My personal favorite is Java, where there is no need to use HTTP REST, instead the native transport protocol can be used) Jörg -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/d6586c50-fad0-46e5-8ff5-d624d821d937%40googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.
Re: How to approach Indexing for a newbie?
Thanks. I added the .jar as a dependency in a simple java project using eclipse. I get this error when I try to run the program, any clues? Exception in thread main java.lang.NoClassDefFoundError: org/apache/lucene/util/Version at org.elasticsearch.Version.clinit(Version.java:42) at org.elasticsearch.node.internal.InternalNode.init(InternalNode.java:121 ) at org.elasticsearch.node.NodeBuilder.build(NodeBuilder.java:159) at org.elasticsearch.node.NodeBuilder.node(NodeBuilder.java:166) at EntryPoint.main(EntryPoint.java:25) Caused by: java.lang.ClassNotFoundException: org.apache.lucene.util.Version at java.net.URLClassLoader$1.run(URLClassLoader.java:366) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:423) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) at java.lang.ClassLoader.loadClass(ClassLoader.java:356) ... 5 more On Tuesday, January 14, 2014 5:22:22 PM UTC-5, Jörg Prante wrote: To get an overview what is possible, look at the Elasticsearch test sources at https://github.com/elasticsearch/elasticsearch/tree/master/src/test/java/org/elasticsearch There are many code snippets that are useful for learning how to use the Java API. You can use Elasticsearch by adding the jar as a dependency in your project (with Maven it is very easy). Jörg -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/c6e0080d-108c-4eda-af15-9cce9546dca5%40googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.
Re: How to index an existing json file
Thank you for the binary flag tip. It is also in the documentation here: http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/docs-bulk.html On Tuesday, January 7, 2014 9:00:33 PM UTC-5, ZenMaster80 wrote: Hi, I am just starting with ElasticSearch, I would like to know how to index a simple json document books.json that has the following in it: Where do I place the document? I placed it in root directory of elastic search and in /bin folder.. {“books”:[{“name”:”life in heaven”,”author”:”Mike Smith”},{“name”:”get rich”,”author”:”Joe Shmoe”},{“name”:”luxury properties”,”author”:”Linda Jones”]}} $ curl -XPUT http://localhost:9200/books/book/1; -d @books.json Warning: Couldn't read data from file books.json, this makes an empty POST. {error:MapperParsingException[failed to parse, document is empty],status:400} Thanks -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/853a876f-c6cb-4dd5-907a-13f626b3f078%40googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.
How to index an existing json file
Hi, I am just starting with ElasticSearch, I would like to know how to index a simple json document books.json that has the following in it: Where do I place the document? I placed it in root directory of elastic search and in /bin folder.. {“books”:[{“name”:”life in heaven”,”author”:”Mike Smith”},{“name”:”get rich”,”author”:”Joe Shmoe”},{“name”:”luxury properties”,”author”:”Linda Jones”]}} $ curl -XPUT http://localhost:9200/books/book/1; -d @books.json Warning: Couldn't read data from file books.json, this makes an empty POST. {error:MapperParsingException[failed to parse, document is empty],status:400} Thanks -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/a5c1e37f-9472-499c-9499-1475c944f47b%40googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.
Re: How to index an existing json file
Great, Do you know why I am getting {error:MapperParsingException[failed to parse]; nested: JsonParseException[Unrecognized token 'life': was expecting ('true', 'false' or 'null')\n at [Source: [B@5c9a9d06; line: 1, column: 35]]; ,status:400} data: {“books”:[{“name”:”life in heaven”,”author”:”Mike Smith”},{“name”:”get rich”,”author”:”Joe Shmoe”},{“name”:”luxury properties”,”author”:”Linda Jones”}]} On Tuesday, January 7, 2014 9:06:01 PM UTC-5, Ivan Brusic wrote: The JSON file is used by the curl command, so in your example it should be in the same directory in which you executed the command (current directory). -- Ivan On Tue, Jan 7, 2014 at 6:00 PM, ZenMaster80 sabda...@gmail.comjavascript: wrote: Hi, I am just starting with ElasticSearch, I would like to know how to index a simple json document books.json that has the following in it: Where do I place the document? I placed it in root directory of elastic search and in /bin folder.. {“books”:[{“name”:”life in heaven”,”author”:”Mike Smith”},{“name”:”get rich”,”author”:”Joe Shmoe”},{“name”:”luxury properties”,”author”:”Linda Jones”]}} $ curl -XPUT http://localhost:9200/books/book/1; -d @books.json Warning: Couldn't read data from file books.json, this makes an empty POST. {error:MapperParsingException[failed to parse, document is empty],status:400} Thanks -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com javascript:. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/a5c1e37f-9472-499c-9499-1475c944f47b%40googlegroups.com . For more options, visit https://groups.google.com/groups/opt_out. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/5d15fcdf-4a0f-4d92-9dd3-f07899d915fe%40googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.