Re: Create mapping for nested json
What version of ES are you trying on ? I faced this issue due to a bug in lower versions. But I am successfully when i upgraded to the newer version. Thanks, Kr On Mon, Apr 6, 2015 at 9:42 PM, secs...@gmail.com wrote: The culprit seems to be Kibana :( I sort of forced ES to show it's hands by explicitly forcing analyzing and storing all fields: curl -XPUT localhost:9200/_template/metrics -d '{ template : metrics, order:2, settings : { index.refresh_interval : 5s }, mappings : { metric : { properties : { Activities : { type : object, properties : { ActivityName : {type : string, index : analyzed, store : true}, ActivityFields : { type : object, properties : { FieldName : {type : string, index : analyzed, store : true}, valueCounts : { type : object, properties : { valueName : {type : string, index : analyzed, store : true}, valueCount : {type : integer, index : analyzed, store : true} } } } } } } } } } }' The resulting JSON in Kibana shows all the extracted fields - only doesn't show them as facets!! It discovers them but won't show them as facets/aggregates. I can search for /Activities.ActivityName: SSH/ but no faceting. Very frustrating. Is there a workaround? On Wednesday, April 1, 2015 at 9:46:49 PM UTC-7, sec...@gmail.com wrote: Hi, Noob at ElasticSearch, I am trying to push some nested json to Elasticsearch and have the nested objects parsed out as facets. If I use dynamic mapping then elasticsearch does not seem to parse out the internal objects. I guess I need to define a mapping for my index? Example: { Date: 2015-03-21T00:09:00, Activities: [ { ActivityName: SSH, Fields: [ { User: [ { joe: 2, jane: 3, jack: 5 } ] }, { DstIP: [ { HostA: 3, HostB: 5, HostC: 6 } ] } ] } ] } I tried to follow the mapping documentation but failed to come up with a mapping that represents the JSON above. I guess I am not sure how to map lists. If it helps, here's how I create the JSON in Scala using the Jackson library: scala nestedMap res3: scala.collection.immutable.Map[String,Object] = Map(Date - 2015-03-21T00:09:00, Activities - List(Map(ActivityName - SSH, Fields - List(Map(User - List(Map(joe - 2, jane - 3, jack - 5))), Map(DstIP - List(Map(HostA - 3, HostB - 5, HostC - 6))) scala println(Serialization.write(nestedMap)) {Date:2015-03-21T00:09:00,Activities:[{ActivityName:SSH,Fields:[{User:[{joe:2,jane:3,jack:5}]},{DstIP:[{HostA:3,HostB:5,HostC:6}]}]}]} Is there a way to get Jackson to spit out the schema that can be directly fed to elasticsearch as a mapping/template? Thanks. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/772aba9c-c85c-4f62-b7fe-d0addd93adcb%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/772aba9c-c85c-4f62-b7fe-d0addd93adcb%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CANH4dajBzbe4A-YCjFSPtjYk2VM%2B7hQga-sZFYNkJ%2B6kNYYstQ%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Any Elasticsearch hosting vendors that support Couchbase-Elasticsearch transport plugin?
Anyone has a knowledge of such vendors.? . I already talked to one and they don't. -Thanks Rajesh -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/61fa7dc2-e9ed-4d0a-bc52-b01b7aac5c16%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Elasticsearch performance tuning
Hi Mark Walkom, I have given below logstash conf file Logstash conf input { file { } } filter { mutate { gsub = [message, \n, ] } mutate { gsub = [message, \t, ] } multiline { pattern = ^ what = previous } grok { match = [ message, %{TIME:log_time}\|%{WORD:Message_type}\|%{GREEDYDATA:Component}\|%{NUMBER:line_number}\| %{GREEDYDATA:log_message}] match = [ path , %{GREEDYDATA}/%{GREEDYDATA:loccode}/%{GREEDYDATA:_machine}\:%{DATE:logdate}.log] break_on_match = false } #To check location is S or L if [loccode] == S or [loccode] == L { ruby { code = temp = event['_machine'].split('_') if !temp.nil? || !temp.empty? event['_machine'] = temp[0] end } } mutate { add_field = [event_timestamp, %{@timestamp} ] replace = [ log_time, %{logdate} %{log_time} ] # Remove the 'logdate' field since we don't need it anymore. lowercase=[loccode] remove = logdate } # to get all site details (site name, city and co-ordinates) sitelocator{sitename = loccode datafile=vendor/sitelocator/SiteDetails.csv} date { locale=en match = [ log_time, -MM-dd HH:mm:ss, MM-dd- HH:mm:ss.SSS,ISO8601 ] } } output { elasticsearch{ } } I have checked step by step to find bottleneck filter. Below filter which took much time. Can you guide me How can I tune it to get faster. date { locale=en match = [ log_time, -MM-dd HH:mm:ss, MM-dd- HH:mm:ss.SSS,ISO8601 ] } } http://serverfault.com/questions/669534/elasticsearch-performance-tuning#comment818613_669558 Thanks Devaraj -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/7eedf369-b10d-442e-b30d-5e7969bf1c59%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Elasticsearch performance tuning
I listed below instance and his heap size details. Medium instance 3.75 RAM 1 cores Storage :4 GB SSD 64-bit Network Java heap size: 2gb R3 Large 15.25 RAM 2 cores Storage :32 GB SSD Java heap size: 7gb R3 High-Memory Extra Large r3.xlarge 30.5 RAM 4 cores Java heap size: 15gb Thanks Devaraj On Friday, February 20, 2015 at 4:15:12 AM UTC+5:30, Mark Walkom wrote: Don't change cache and buffer sizes unless you know what is happening, the defaults are going to be fine. How much heap did you give ES? I'm not sure you can do much about the date filter though, maybe someone else has pointers. On 19 February 2015 at 21:12, Deva Raj devara...@gmail.com javascript: wrote: Hi Mark Walkom, I have given below logstash conf file Logstash conf input { file { } } filter { mutate { gsub = [message, \n, ] } mutate { gsub = [message, \t, ] } multiline { pattern = ^ what = previous } grok { match = [ message, %{TIME:log_time}\|%{WORD:Message_type}\|%{GREEDYDATA:Component}\|%{NUMBER:line_number}\| %{GREEDYDATA:log_message}] match = [ path , %{GREEDYDATA}/%{GREEDYDATA:loccode}/%{GREEDYDATA:_machine}\:%{DATE:logdate}.log] break_on_match = false } #To check location is S or L if [loccode] == S or [loccode] == L { ruby { code = temp = event['_machine'].split('_') if !temp.nil? || !temp.empty? event['_machine'] = temp[0] end } } mutate { add_field = [event_timestamp, %{@timestamp} ] replace = [ log_time, %{logdate} %{log_time} ] # Remove the 'logdate' field since we don't need it anymore. lowercase=[loccode] remove = logdate } # to get all site details (site name, city and co-ordinates) sitelocator{sitename = loccode datafile=vendor/sitelocator/SiteDetails.csv} date { locale=en match = [ log_time, -MM-dd HH:mm:ss, MM-dd- HH:mm:ss.SSS,ISO8601 ] } } output { elasticsearch{ } } I have checked step by step to find bottleneck filter. Below filter which took much time. Can you guide me How can I tune it to get faster. date { locale=en match = [ log_time, -MM-dd HH:mm:ss, MM-dd- HH:mm:ss.SSS,ISO8601 ] } } http://serverfault.com/questions/669534/elasticsearch-performance-tuning#comment818613_669558 Thanks Devaraj -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com javascript:. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/7eedf369-b10d-442e-b30d-5e7969bf1c59%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/7eedf369-b10d-442e-b30d-5e7969bf1c59%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/5335d517-d7d6-482f-a4b4-6ab06eb13e02%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Elasticsearch performance tuning
Hi All, In a Single Node Elastic Search along with logstash, We tested with 20mb and 200mb file parsing to Elastic Search on Different types of the AWS instance i.e Medium, Large and Xlarge. Environment Details : Medium instance 3.75 RAM 1 cores Storage :4 GB SSD 64-bit Network Performance: Moderate Instance running with : Logstash, Elastic search Scenario: 1 **With default settings** Result : 20mb logfile 23 mins Events Per/second 175 200mb logfile 3 hrs 3 mins Events Per/second 175 Added the following to settings: Java heap size : 2GB bootstrap.mlockall: true indices.fielddata.cache.size: 30% indices.cache.filter.size: 30% index.translog.flush_threshold_ops: 5 indices.memory.index_buffer_size: 50% # Search thread pool threadpool.search.type: fixed threadpool.search.size: 20 threadpool.search.queue_size: 100 **With added settings** Result: 20mb logfile 22 mins Events Per/second 180 200mb logfile 3 hrs 07 mins Events Per/second 180 Scenario 2 Environment Details : R3 Large 15.25 RAM 2 cores Storage :32 GB SSD 64-bit Network Performance: Moderate Instance running with : Logstash, Elastic search **With default settings** Result : 20mb logfile 7 mins Events Per/second 750 200mb logfile 65 mins Events Per/second 800 Added the following to settings: Java heap size: 7gb other parameters same as above **With added settings** Result: 20mb logfile 7 mins Events Per/second 800 200mb logfile 55 mins Events Per/second 800 Scenario 3 Environment Details : R3 High-Memory Extra Large r3.xlarge 30.5 RAM 4 cores Storage :32 GB SSD 64-bit Network Performance: Moderate Instance running with : Logstash, Elastic search **With default settings** Result: 20mb logfile 7 mins Events Per/second 1200 200mb logfile 34 mins Events Per/second 1200 Added the following to settings: Java heap size: 15gb other parameters same as above **With added settings** Result: 20mb logfile 7 mins Events Per/second 1200 200mb logfile 34 mins Events Per/second 1200 I wanted to know 1. What is the benchmark for the performance? 2. Is the performance meets the benchmark or is it below the benchmark 3. Why even after i increased the elasticsearch JVM iam not able to find the difference? 4. how do i monitor Logstash and improve its performance? appreciate any help on this as iam new to logstash and elastic search. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/f5b136c9-de21-4f0c-ba78-d8146376f307%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Elasticsearch performance tuning
Hi Mark Walkom, Thanks mark and i miss anything to tuning performance of elasticsearch. Added the following to elasticsearch settings: Java heap size : Half of physical memory bootstrap.mlockall: true indices.fielddata.cache.size: 30% indices.cache.filter.size: 30% index.translog.flush_ threshold_ops: 5 indices.memory.index_buffer_size: 50% On Thursday, February 19, 2015 at 7:25:27 AM UTC+5:30, Mark Walkom wrote: 1. It depends 2. It depends 3. It depends 4. It also depends. The performance of ES is dependent on you; your data, your use, your queries, your hardware, your configuration. If that is the results you got then it is indicative to your setup and thus is your benchmark, and from there you can tweak and try to improve performance. Monitoring LS is a little harder as there are no APIs for it (yet). Most of the performance of it will result on your filters (especially grok). -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/54a9031b-1e73-42b7-92b9-7ae3bda46ee7%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: not able to refine from o/p of query in logstash
can some put some light on my problem. I am sorry to bump this thread again. I am not able to get any idea if we have any option that we can call another script within my logstash query to get on my requirements. Please let me know if anything is not clear. Thanks In advance ! -- View this message in context: http://elasticsearch-users.115913.n3.nabble.com/not-able-to-refine-from-o-p-of-query-in-logstash-tp4069573p4069952.html Sent from the ElasticSearch Users mailing list archive at Nabble.com. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/1422904061306-4069952.post%40n3.nabble.com. For more options, visit https://groups.google.com/d/optout.
search users based on degree of connections - for typeahead functionality
I have implemented term search based on nGrams/Filters/tokenizers for typeahead functionality . Where user types parts of name and it brings up users matching to the typed text. But I have requirement around data modeling and implementation, for when someone (user A) searches other users . It should bring users matching to his/her term based on following order. - People whom I follow - People who follow me - Everyone else. Has anyone solved this problem using elastic search?. If yes what should be the data model and mappings? -Thanks Rajesh -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/9b34b088-8e25-4119-a252-32a9ccbcb751%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: not able to refine from o/p of query in logstash
Can anyone help me on this problem, please ! -- View this message in context: http://elasticsearch-users.115913.n3.nabble.com/not-able-to-refine-from-o-p-of-query-in-logstash-tp4069573p4069775.html Sent from the ElasticSearch Users mailing list archive at Nabble.com. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/1422522129753-4069775.post%40n3.nabble.com. For more options, visit https://groups.google.com/d/optout.
not able to refine from o/p of query in logstash
I am using the below query to pull the information from logstash:: curl -XGET ' http://logs:xx00/_all/_search?pretty=true' -d ' { query: { bool: { must: [ { match: { _type: pre } }, { match: { message: MapDone } }, { range: { @timestamp: { gte: now-5m } } } ] } } }' Output :: { took : 177, timed_out : false, _shards : { total : 3225, successful : 3225, failed : 0 }, hits : { total : 1238, max_score : 4.3801584, hits : [ { _index : fi-logstash-2015.01.21, _type : fi, _id : CORYzNPHnnQeu09A, _score : 4.3801584, _source:{thread_name:main,message:[MapDone]\tstandards.po.poRsxWrite in 169ms,@timestamp:2015-01-21T14:48:59.835+00:00,level:INFO,mdc:{},file:fi-1-small-log.json,class:fi.log.MapLogHandler,line_number:21,logger_name:fi.Mapper,method:info,@version:1,source_host:fi.pp,host:prefi2,offset:185244882,type:prefi,tags:[instance],syslog_severity_code:5,syslog_facility_code:1,syslog_facility:user-level,syslog_severity:notice} } The above is only a part of the output.I am trying to get only the map name as output. When I am trying , I am getting errors. Different sample Maps:: formats.pure.qm.fromSIP.toCSV.write in 24ms H044Grain.hub.asn.from.advanceShipNoticeWrite in 188ms H9B1honey.hub.po.fromFEDSto.purchaseOrder in 416ms HAEPrugs.hub.rsx.v7.r0.po.poFedsWrite in 231ms H4Grain2.hub.in.fromtoAPP.invoiceWrite in 110ms H2Home.v700.e4060.co.in.inFedsWrite in 108ms I am tring to get:: 1 - only mapping names ( H4Grain2.hub.in.from.invoiceWrite ) 2 - unique mappings ( something like | uniq to previous o/p ) 3 - Average of last 1 minutes mappings Can anybody help check if this is possible. Thanks a ton in advance. http://logs:xx00/_all/_search?pretty=true' -- View this message in context: http://elasticsearch-users.115913.n3.nabble.com/not-able-to-refine-from-o-p-of-query-in-logstash-tp4069573.html Sent from the ElasticSearch Users mailing list archive at Nabble.com. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/1422265133181-4069573.post%40n3.nabble.com. For more options, visit https://groups.google.com/d/optout.
Re: not able to refine from o/p of query in logstash
can anyone help with this.just bumping this email. sorry if I am breaking any -- View this message in context: http://elasticsearch-users.115913.n3.nabble.com/not-able-to-refine-from-o-p-of-query-in-logstash-tp4069573p4069621.html Sent from the ElasticSearch Users mailing list archive at Nabble.com. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/1422333875969-4069621.post%40n3.nabble.com. For more options, visit https://groups.google.com/d/optout.
Re: Crying for help:: MapperParsingException when trying to create index with mapping
Hi Masaru, Thank You. It did not work. Because we were using 1.3.x version of elasticsearch and there is bug for solaris for mapping nested object. I upgraded ES and the mapping is good now. Thanks, Krishna Raj On Tuesday, January 6, 2015 at 2:00:01 PM UTC-8, Krishna Raj wrote: Hi, I am trying to create an index with mapping which contains nested object. I also tried to update the mapping of an empty created index with the same below mapping which contains nested object. But I am getting MapperParsingException error all time. My cluster goes down and recoveres automatically. I am banging my head on this for last 1 week with no help. Any help is greatly appreciated. Reference: http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/mapping-nested-type.html *My Sample JSON: * { timeStamp: 2014-12-31T19:15:45.000+, metrics: [ { name: viewList, ave: 10.5 }, { name: checkout, ave: 20.5 }, { name: login, ave: 30.5 }, { name: logout, ave: 40.5 } ] } *Mapping I am trying:* curl -XPUT 'http://myhost:9201/testagg/testagg/_mapping' -d '{ testagg: { properties: { timeStamp: { format: dateOptionalTime, type: date }, properties: { metrics: { type: nested, properties: { name: { type: string }, ave: { type: double } } } } } } }' Thanks, Krishna Raj -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/8503d1a7-b951-4550-9da5-a21a519a59f9%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: What happens to data in an existing type if we update the mapping to specify 'path's for _id and _routing
Any ideas on this? On Monday, October 27, 2014 3:03:16 PM UTC+5:30, Preeti Raj - Buchhada wrote: We are using ES 1.3.2. We have a need to specify custom id and routing values when indexing. We've been doing this using Java APIs, however we would now like to update the mapping to specify 'path's for _id and _routing. The question we have is: 1) Since this type already has a huge number of documents, can we change the mapping? When we tried it, we got a 'acknowledged: true' response, but it doesn't seem to be working when we tried indexing. 2) In case there is a way to achieve this, will it affect only the new documents being indexed? -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/1b912d68-900f-4f6f-be5f-cbae83776e1b%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
What happens to data in an existing type if we update the mapping to specify 'path's for _id and _routing
We are using ES 1.3.2. We have a need to specify custom id and routing values when indexing. We've been doing this using Java APIs, however we would now like to update the mapping to specify 'path's for _id and _routing. The question we have is: 1) Since this type already has a huge number of documents, can we change the mapping? When we tried it, we got a 'acknowledged: true' response, but it doesn't seem to be working when we tried indexing. 2) In case there is a way to achieve this, will it affect only the new documents being indexed? -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/1c5516bd-7738-4969-8bee-b979aa89b65b%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Is there a way to update ES records using Spark?
Anyone has an idea? At least if I get to know whether this is possible or not, that'll be a great help. Thanks. On Wednesday, October 1, 2014 3:46:51 PM UTC+5:30, Preeti Raj - Buchhada wrote: I am using ES version 1.3.2, and Spark 1.1.0. I can successfully read and write records from/to ES using newAPIHadoopRDD() and saveAsNewAPIHadoopDataset(). However, I am struggling to find a way to update records. Even I specify a 'key' in ESOutputFormat it gets ignored, as documented clearly. So my question is : Is there a way to specify document ID and custom routing values when writing to ES using Spark? If yes, how? -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/b6e4628a-5106-4f2b-997d-e790a8aeb455%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Is there a way to update ES records using Spark?
Thanks for your reply Costin. However, we have a need to compute a custom ID based on concatenation of multiple fields values and then computing the hash value. So simply specifying 'es.mapping.id' will not help in our case. Is there any other way? On Monday, October 13, 2014 4:08:05 PM UTC+5:30, Costin Leau wrote: You can the mapping options [1], namely `es.mapping.id` to specify the id field of your documents. [1] http://www.elasticsearch.org/guide/en/elasticsearch/hadoop/master/configuration.html#cfg-mapping On Mon, Oct 13, 2014 at 12:55 PM, Preeti Raj - Buchhada pbuc...@gmail.com javascript: wrote: Anyone has an idea? At least if I get to know whether this is possible or not, that'll be a great help. Thanks. On Wednesday, October 1, 2014 3:46:51 PM UTC+5:30, Preeti Raj - Buchhada wrote: I am using ES version 1.3.2, and Spark 1.1.0. I can successfully read and write records from/to ES using newAPIHadoopRDD() and saveAsNewAPIHadoopDataset(). However, I am struggling to find a way to update records. Even I specify a 'key' in ESOutputFormat it gets ignored, as documented clearly. So my question is : Is there a way to specify document ID and custom routing values when writing to ES using Spark? If yes, how? -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com javascript:. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/b6e4628a-5106-4f2b-997d-e790a8aeb455%40googlegroups.com. For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/fc10e3fb-158b-4ae2-9117-beea8a620865%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Is there a way to update ES records using Spark?
I am using ES version 1.3.2, and Spark 1.1.0. I can successfully read and write records from/to ES using newAPIHadoopRDD() and saveAsNewAPIHadoopDataset(). However, I am struggling to find a way to update records. Even I specify a 'key' in ESOutputFormat it gets ignored, as documented clearly. So my question is : Is there a way to specify document ID and custom routing values when writing to ES using Spark? If yes, how? -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/529fb146-bea8-48ac-aed0-d6908775f85d%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: ES Plugin to extend Lucene's Standard Tokenizer
Hi Vineeth, I haven't looked at the plugin Bryan has created , However creating a plugin for special characters gives better performance over patter tokenizer or custom filters. Regards, Raj On Tuesday, September 9, 2014 9:06:08 AM UTC+5:30, vineeth mohan wrote: Hello Bryan , Congrats on your first plugin. I have a question here - Can you implement the whole plugin by using http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/analysis-pattern-tokenizer.html tokenizer ? Is your plugin providing any advantage over going this approach ? Thanks Vineeth On Tue, Sep 9, 2014 at 7:56 AM, Bryan Warner bryan@gmail.com javascript: wrote: Hi all, Recently, I've been working on an extension to Lucene's Standard Tokenizer that allows the user to customize / override the default word boundary break rules for Unicode characters. The Standard Tokenizer implements the word break rules from the Unicode Text segmentation http://www.unicode.org/reports/tr29/ algorithm where most punctuation symbols (except for underscore '_') are treated as hard word breaks (e.g. @foo , #foo are tokenized to foo). While the Standard Tokenizer works great in most cases, I found that being unable to override the default word break rules was quite limiting especially since a lot of these punctuation symbols have important meaning now on the web (@ - mentions, # - hashtags, etc.) I've wrapped this extension to the Standard Tokenizer in an ElasticSearch plugin, which can be found at - https://github.com/bbguitar77/elasticsearch-analysis-standardext ... definitely looking for feedback as this is my first go at an ElasticSearch plugin! I'm hoping other ElasticSearch / Lucene users find this helpful. Cheers! Bryan -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com javascript:. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/929dc7c3-ff99-43a4-a287-1a8f89d86e3f%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/929dc7c3-ff99-43a4-a287-1a8f89d86e3f%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/dd2fd3b0-f6c1-40e0-b2d7-723084027354%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Pattern_capture filter emits a token that is not matched with pattern also.
I have a case where I have to extract domain part from emails that are found in a text. I used uax_url_email tokenizer to create emails as a single. And I have a pattern_capture filter which will emit @(.+) pattern string. But uax_url_email also return words also which is not an email and the pattern capture filter does not filter that. Any suggestions? custom_analyzer:{ tokenizer: uax_url_email, filter: [ email_domain_filter ] } filter: { email_domain_filter:{ type: pattern_capture, preserve_original: false, patterns: [ @(.+) ] } } *input string* : my email id is x...@gmail.com *Output tokens:* my, email, id, is, gmail.com But I need only gmail.com -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/3de51758-bb99-46c6-b47c-a68004de8eb8%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.