Re: stuck thread problem?

2014-08-29 Thread Martin Forssen
FYI, this turned out to be a real bug. A fix has been committed and will be 
included in the next release.

On Wednesday, August 27, 2014 11:36:03 AM UTC+2, Martin Forssen wrote:

 I did report it https://github.com/elasticsearch/elasticsearch/issues/7478



-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/04d9c094-112d-4d7d-bd48-e4fa2ff3a774%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Explicitly Copying Replica Shards That Fail to Start

2014-08-29 Thread David Kleiner
Thank you Mark! 

Setting 

{
index : {
number_of_replicas : 0
} }

and then back to 1 cleared the bad replicas and rebuilt them from primaries.

Much appreciated,

David

On Thursday, August 28, 2014 3:53:32 PM UTC-7, Mark Walkom wrote:

 Yep, the easiest way is to drop the replica and then add it back and see 
 how you go.

 Regards,
 Mark Walkom

 Infrastructure Engineer
 Campaign Monitor
 email: ma...@campaignmonitor.com javascript:
 web: www.campaignmonitor.com
  

 On 29 August 2014 08:40, David Kleiner david@gmail.com javascript: 
 wrote:

 Greetings,

 I am still having a problem with recovery of 5 replica shards in 2 
 indices of mine, 3-way cluster.  The replica shards fail to initialize and 
 are jumping around two secondary nodes.  The primary shards are fine.  

 What is my path to recovery?  Is copying master shard to secondary nodes 
 a correct way?  I tried issuing routing commands to cancel 
 recovery/allocation, it helped with some secondary shards but not with the 
 5 in question. 

 I also tried dumping index with failing secondary shards but two nodes 
 crashed (well, lost connection to cluster) so dump failed. 

 Would setting replica # to 0, copying masters to 2 nodes and setting 
 replica #  to 1 a viable alternative?

 Thank you,

 David

 -- 
 You received this message because you are subscribed to the Google Groups 
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an 
 email to elasticsearc...@googlegroups.com javascript:.
 To view this discussion on the web visit 
 https://groups.google.com/d/msgid/elasticsearch/8e7c4f11-2790-49d6-8c65-87e9aa05aa3b%40googlegroups.com
  
 https://groups.google.com/d/msgid/elasticsearch/8e7c4f11-2790-49d6-8c65-87e9aa05aa3b%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.




-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/a2482a81-5be8-4ed2-ad43-e37330446376%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: stuck thread problem?

2014-08-29 Thread Patrick Proniewski
Thank you!

On 29 août 2014, at 08:49, Martin Forssen m...@recordedfuture.com wrote:

 FYI, this turned out to be a real bug. A fix has been committed and will be 
 included in the next release.
 
 On Wednesday, August 27, 2014 11:36:03 AM UTC+2, Martin Forssen wrote:
 
 I did report it https://github.com/elasticsearch/elasticsearch/issues/7478
 

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/DFFDD1A6-9F76-4AC0-8211-95C47CC5CAC7%40patpro.net.
For more options, visit https://groups.google.com/d/optout.


Re: I search same thing, but once can get and once can not get???

2014-08-29 Thread fiefdx yang
I can sure there is only one master node exists and 16 nodes work as one 
cluster.
I think I know what it happened. I found that when successive requests, 
elasticsearch will execute query by once primary shards once replica shards 
at default configuration.
But what I can not understand is why the primary shads and the replica 
shards given different result at the same time point?
This happened when I index some new documents but not refresh, if I refresh 
the cluster, then the primary shards and the replica shards will give the 
same result.

On Thursday, August 28, 2014 6:41:22 PM UTC+8, Greg Murnane wrote:

 This is a symptom that could happen with bad GC events, or with split 
 brain. Can you look at the GC logging output to see how long the stop the 
 world pauses you're seeing are? You can also run a query like  curl -XGET '
 http://localhost:9200/_cluster/state/master_node?local=true'  on each of 
 the nodes to make sure that they agree on which one is the master node.

 Look also at wait CPU and disk utilization when you run a query. Unless 
 you have a physical disk for each node on this system, it's likely that 
 there can be IO contention with 16 nodes querying the disks.

 If all that looks ok, if you are running replicas, then you can try 
 pulling out a replica and an original, and loading them into an isolated ES 
 node on another system, and query there. It's possible that some of the 
 replicas could be corrupted, and this would allow you to detect that.

 -

 Out of curiousity, though, I wonder what the purpose of running so many 
 nodes on a single machine is. ES is very effective at using the entire CPU 
 with only one node, and replicating your heap size 16 times, adding IO 
 contention, and splitting the cache 16 ways all seem like they would hurt 
 performance immensely.

 The information transmitted in this email is intended only for the 
 person(s) or entity to which it is addressed and may contain confidential 
 and/or privileged material. Any review, retransmission, dissemination or 
 other use of, or taking of any action in reliance upon, this information by 
 persons or entities other than the intended recipient is prohibited. If you 
 received this email in error, please contact the sender and permanently 
 delete the email from any computer.

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/d86c4ac0-910f-4d00-8276-bec8aed220ad%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


help needed scripting update to list element (bulk request)

2014-08-29 Thread eunever32
Hi
Say I have a list of elements like this:

PUT twitter/twit/1
{
   list: [
   {
  a: b,
  c: d,
*  e: f*
},
{
  1: 2,
  3: 4
}
]
}

And I want to change the value of e (currently f) to say new_f  such 
that the document looks like: 

{
   list: [
   {
  a: b,
  c: d,
*  e: new_f*
},
{
  1: 2,
  3: 4
}
]
}

Is there a way to do this ? Maybe in MVEL ?

Do I match on document 
   {
  a: b,
  c: d,
  e: f
}

ie if list.contains(document) { some kind of update; }  // is this possible 
?


I know MVEL is being deprecated in 1.4 however it will do for now. 

I want to use bulk request. 

I know it's possible to remove the element like this: 
bulkRequestBuilder.setScript(if (ctx._source.list.contains(document)) 
{ctx._source.list.remove(document)} }).setScriptParams etc


but is it possible to update a field in the document also ?

Thanks.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/da238495-7cf3-4215-a77e-2144499b8859%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Bulk UDP API

2014-08-29 Thread Bart Vandewoestyne
I'm trying to index data using the bulk UDP API on a single node 
Elasticsearch 1.3.2.  In my elasticsearch config I have

bulk.udp.enabled: true

My bulk file has 85000 documents and has the following characteristics:

bart@hp-g7-02:~/git/data$ ls -al mydata.json 
-rw-rw-r-- 1 bart bart 97818287 Aug 28 15:43 mydata.json

bart@hp-g7-02:~/git/data$ wc -l mydata.json 
170001 mydata.json

bart@hp-g7-02:~/git/data$ file mydata.json 
mydata.json: UTF-8 Unicode English text, with very long lines

Indexing the data using the bulk API described at 
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/docs-bulk.html
 
works.  I see the documents in my elasticsearch store once the bulk upload 
is finished.

However, if I use the same bulk file and try to index it using the command

cat mydata.json | nc -w 0 -u localhost 9700

then only 1 document gets indexed, and I see lots of parsing errors like 
the following in my log files:

[2014-08-29 11:28:41,649][WARN ][bulk.udp ] [Mysterio] 
failed to execute bulk request
org.elasticsearch.common.jackson.core.JsonParseException: Unrecognized 
token '_index': was expecting ('true', 'false' or 'null')
 at [Source: [B@656f95ce; line: 1, column: 15]
at org.elasticsearch.common.jackson.core.JsonParser._constructError(
JsonParser.java:1419)
at org.elasticsearch.common.jackson.core.base.ParserMinimalBase.
_reportError(ParserMinimalBase.java:508)
at org.elasticsearch.common.jackson.core.json.UTF8StreamJsonParser.
_reportInvalidToken(UTF8StreamJsonParser.java:3201)
at org.elasticsearch.common.jackson.core.json.UTF8StreamJsonParser.
_handleUnexpectedValue(UTF8StreamJsonParser.java:2360)
at org.elasticsearch.common.jackson.core.json.UTF8StreamJsonParser.
_nextTokenNotInObject(UTF8StreamJsonParser.java:794)
at org.elasticsearch.common.jackson.core.json.UTF8StreamJsonParser.
nextToken(UTF8StreamJsonParser.java:690)
at org.elasticsearch.common.xcontent.json.JsonXContentParser.
nextToken(JsonXContentParser.java:50)
at org.elasticsearch.action.bulk.BulkRequest.add(BulkRequest.java:
266)
at org.elasticsearch.action.bulk.BulkProcessor.add(BulkProcessor.
java:256)
at org.elasticsearch.action.bulk.BulkProcessor.add(BulkProcessor.
java:252)
at org.elasticsearch.bulk.udp.BulkUdpService$Handler.messageReceived
(BulkUdpService.java:181)
at org.elasticsearch.common.netty.channel.
SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.
java:70)
at org.elasticsearch.common.netty.channel.DefaultChannelPipeline.
sendUpstream(DefaultChannelPipeline.java:564)
at org.elasticsearch.common.netty.channel.DefaultChannelPipeline.
sendUpstream(DefaultChannelPipeline.java:559)
at org.elasticsearch.common.netty.channel.Channels.
fireMessageReceived(Channels.java:268)
at org.elasticsearch.common.netty.channel.socket.nio.
NioDatagramWorker.read(NioDatagramWorker.java:98)
at org.elasticsearch.common.netty.channel.socket.nio.
AbstractNioWorker.process(AbstractNioWorker.java:108)
at org.elasticsearch.common.netty.channel.socket.nio.
AbstractNioSelector.run(AbstractNioSelector.java:318)
at org.elasticsearch.common.netty.channel.socket.nio.
AbstractNioWorker.run(AbstractNioWorker.java:89)
at org.elasticsearch.common.netty.channel.socket.nio.
NioDatagramWorker.run(NioDatagramWorker.java:343)
at org.elasticsearch.common.netty.util.ThreadRenamingRunnable.run(
ThreadRenamingRunnable.java:108)
at org.elasticsearch.common.netty.util.internal.
DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
at java.util.concurrent.ThreadPoolExecutor.runWorker(
ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(
ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)

I find it strange that things work using the usual bulk API, but not with 
the bulk UDP API.

Am I overlooking something or doing something wrong?

Thanks,
Bart

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/6a676c4f-afd1-48a1-ab40-8c258aa3c54e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Replica assignement on the same host

2014-08-29 Thread 'Nicolas Fraison' via elasticsearch
Hi,

I have an ES cluster with 12 data nodes spread on 6 servers (so 2 nodes per 
server) and I saw that replicas of a shard can be allocated on the same 
server(on each nodes hosted by a server)

To avoid this I haveset those parameters to the cluster:
node.host: server_name
cluster.routing.allocation.awareness.attributes: zone, host

But I'm wondering if there are not a specific parameter for this instead of 
using clustering awareness allocation?

Nicolas

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/4d3924ab-77e4-49ef-9039-52df801ff46d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Replica assignement on the same host

2014-08-29 Thread Mark Walkom
That's the best method as per
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-cluster.html#allocation-awareness

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com


On 29 August 2014 20:45, 'Nicolas Fraison' via elasticsearch 
elasticsearch@googlegroups.com wrote:

 Hi,

 I have an ES cluster with 12 data nodes spread on 6 servers (so 2 nodes
 per server) and I saw that replicas of a shard can be allocated on the same
 server(on each nodes hosted by a server)

 To avoid this I haveset those parameters to the cluster:
 node.host: server_name
 cluster.routing.allocation.awareness.attributes: zone, host

 But I'm wondering if there are not a specific parameter for this instead
 of using clustering awareness allocation?

 Nicolas

 --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/4d3924ab-77e4-49ef-9039-52df801ff46d%40googlegroups.com
 https://groups.google.com/d/msgid/elasticsearch/4d3924ab-77e4-49ef-9039-52df801ff46d%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAEM624YaxHLT%3DsptzqcSQw3i9u9oozO_2DstFJ6vCs-VC_bzOw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: stuck thread problem?

2014-08-29 Thread Martijn v Groningen
Hi Patrick,

Did you see the same stuck thread via jstack or the hot thread api that
Martin reported? This can only happen if scan search was enabled (by
setting search_type=scan in a search request)
If that isn't the case then something else is maybe stuck.

Martijn


On 29 August 2014 09:58, Patrick Proniewski elasticsea...@patpro.net
wrote:

 Thank you!

 On 29 août 2014, at 08:49, Martin Forssen m...@recordedfuture.com wrote:

  FYI, this turned out to be a real bug. A fix has been committed and will
 be
  included in the next release.
 
  On Wednesday, August 27, 2014 11:36:03 AM UTC+2, Martin Forssen wrote:
 
  I did report it
 https://github.com/elasticsearch/elasticsearch/issues/7478
 

 --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/DFFDD1A6-9F76-4AC0-8211-95C47CC5CAC7%40patpro.net
 .
 For more options, visit https://groups.google.com/d/optout.




-- 
Met vriendelijke groet,

Martijn van Groningen

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CA%2BA76Ty1RLHNButgkgYZ3pt_L0ygtonn7y8QpM%3D-0ttC%2BM84gQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


elasticsearch template to use standard analyzer but addional token_filter word_delimiter

2014-08-29 Thread Marc
Hi,

I am using logstash and elasticsearch for log analysis.
The standard analyzer does a pretty good job; however, it will not split 
things like word1.word2.
Therefore, I want to add the token_filter word_delimiter.
How would such an additional logstash template look?
Also how to limit this addition just to certain fields?

Thx
Marc

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/b2009958-6ab5-49f0-8027-f3259289442c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: stuck thread problem?

2014-08-29 Thread Martijn v Groningen
Hi Patrick,

I this problem happens again then you should execute the hot threads api:
curl localhost:9200/_nodes/hot_threads

Documentation:
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/cluster-nodes-hot-threads.html#cluster-nodes-hot-threads

Just pick a node in your cluster and run that command. This is the
equivalent of running jstack on all the nodes in your cluster.

Martijn


On 29 August 2014 13:34, Patrick Proniewski elasticsea...@patpro.net
wrote:

 Hi,

 I don't know how to debug a JAVA process. Haven't heard about jstack until
 it was mentioned in this thread.
 All I know is what I've posted in my first message.

 I've restarted ES, and currently I've no stuck thread to investigate. In
 the mean time, you can teach me how I should use jstack, so next time it
 happens I'll be ready.

 On 29 août 2014, at 13:19, Martijn v Groningen 
 martijn.v.gronin...@gmail.com wrote:

  Hi Patrick,
 
  Did you see the same stuck thread via jstack or the hot thread api that
  Martin reported? This can only happen if scan search was enabled (by
  setting search_type=scan in a search request)
  If that isn't the case then something else is maybe stuck.
 
  Martijn
 
 
  On 29 August 2014 09:58, Patrick Proniewski elasticsea...@patpro.net
  wrote:
 
  Thank you!
 
  On 29 août 2014, at 08:49, Martin Forssen m...@recordedfuture.com
 wrote:
 
  FYI, this turned out to be a real bug. A fix has been committed and
 will
  be
  included in the next release.
 
  On Wednesday, August 27, 2014 11:36:03 AM UTC+2, Martin Forssen wrote:
 
  I did report it
  https://github.com/elasticsearch/elasticsearch/issues/7478
 
 
  --
  You received this message because you are subscribed to the Google
 Groups
  elasticsearch group.
  To unsubscribe from this group and stop receiving emails from it, send
 an
  email to elasticsearch+unsubscr...@googlegroups.com.
  To view this discussion on the web visit
 
 https://groups.google.com/d/msgid/elasticsearch/DFFDD1A6-9F76-4AC0-8211-95C47CC5CAC7%40patpro.net
  .
  For more options, visit https://groups.google.com/d/optout.
 
 
 
 
  --
  Met vriendelijke groet,
 
  Martijn van Groningen
 
  --
  You received this message because you are subscribed to the Google
 Groups elasticsearch group.
  To unsubscribe from this group and stop receiving emails from it, send
 an email to elasticsearch+unsubscr...@googlegroups.com.
  To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/CA%2BA76Ty1RLHNButgkgYZ3pt_L0ygtonn7y8QpM%3D-0ttC%2BM84gQ%40mail.gmail.com
 .
  For more options, visit https://groups.google.com/d/optout.

 --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/34529AB7-AD03-404F-9787-60BD6B90E1A4%40patpro.net
 .
 For more options, visit https://groups.google.com/d/optout.




-- 
Met vriendelijke groet,

Martijn van Groningen

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CA%2BA76TxbSpXgVwRmfF5X3%2BDzsWgz8iNgaTjWXOP7iT1NdfHLow%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


which class file trigger writing of segments.gen / segments_1

2014-08-29 Thread Jason Wee
Hello people,

Anybody know which class/component in elastic search trigger writing 
of segments.gen and segments_1? I'm currently using elastic search version 
1.2.1. It would be great if you can provide link pin point which line in 
the class does that.

Thank you.

/Jason

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/bdfc3860-cb26-4f21-9597-4f500eb950e2%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: which class file trigger writing of segments.gen / segments_1

2014-08-29 Thread joergpra...@gmail.com
This is Lucene, when indexing starts. Look at the SegmentsInfo class
https://lucene.apache.org/core/4_9_0/core/org/apache/lucene/index/SegmentInfos.html

Jörg


On Fri, Aug 29, 2014 at 2:38 PM, Jason Wee peich...@gmail.com wrote:

 Hello people,

 Anybody know which class/component in elastic search trigger writing
 of segments.gen and segments_1? I'm currently using elastic search version
 1.2.1. It would be great if you can provide link pin point which line in
 the class does that.

 Thank you.

 /Jason

 --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/bdfc3860-cb26-4f21-9597-4f500eb950e2%40googlegroups.com
 https://groups.google.com/d/msgid/elasticsearch/bdfc3860-cb26-4f21-9597-4f500eb950e2%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoG%3Ddauci86-7Y-RaN%2BJW94kqXU%3DwTA3kgxLO5Mj%3DLL0aQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Bulk UDP API

2014-08-29 Thread joergpra...@gmail.com
Maybe it is the line feeds in mydata.json, probably you are not using UNIX
LFs with single \n ?

Jörg


On Fri, Aug 29, 2014 at 11:36 AM, Bart Vandewoestyne 
bart.vandewoest...@gmail.com wrote:

 I'm trying to index data using the bulk UDP API on a single node
 Elasticsearch 1.3.2.  In my elasticsearch config I have

 bulk.udp.enabled: true

 My bulk file has 85000 documents and has the following characteristics:

 bart@hp-g7-02:~/git/data$ ls -al mydata.json
 -rw-rw-r-- 1 bart bart 97818287 Aug 28 15:43 mydata.json

 bart@hp-g7-02:~/git/data$ wc -l mydata.json
 170001 mydata.json

 bart@hp-g7-02:~/git/data$ file mydata.json
 mydata.json: UTF-8 Unicode English text, with very long lines

 Indexing the data using the bulk API described at
 http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/docs-bulk.html
 works.  I see the documents in my elasticsearch store once the bulk upload
 is finished.

 However, if I use the same bulk file and try to index it using the command

 cat mydata.json | nc -w 0 -u localhost 9700

 then only 1 document gets indexed, and I see lots of parsing errors like
 the following in my log files:

 [2014-08-29 11:28:41,649][WARN ][bulk.udp ] [Mysterio]
 failed to execute bulk request
 org.elasticsearch.common.jackson.core.JsonParseException: Unrecognized
 token '_index': was expecting ('true', 'false' or 'null')
  at [Source: [B@656f95ce; line: 1, column: 15]
 at org.elasticsearch.common.jackson.core.JsonParser.
 _constructError(JsonParser.java:1419)
 at org.elasticsearch.common.jackson.core.base.ParserMinimalBase.
 _reportError(ParserMinimalBase.java:508)
 at org.elasticsearch.common.jackson.core.json.UTF8StreamJsonParser
 ._reportInvalidToken(UTF8StreamJsonParser.java:3201)
 at org.elasticsearch.common.jackson.core.json.UTF8StreamJsonParser
 ._handleUnexpectedValue(UTF8StreamJsonParser.java:2360)
 at org.elasticsearch.common.jackson.core.json.UTF8StreamJsonParser
 ._nextTokenNotInObject(UTF8StreamJsonParser.java:794)
 at org.elasticsearch.common.jackson.core.json.UTF8StreamJsonParser
 .nextToken(UTF8StreamJsonParser.java:690)
 at org.elasticsearch.common.xcontent.json.JsonXContentParser.
 nextToken(JsonXContentParser.java:50)
 at org.elasticsearch.action.bulk.BulkRequest.add(BulkRequest.java:
 266)
 at org.elasticsearch.action.bulk.BulkProcessor.add(BulkProcessor.
 java:256)
 at org.elasticsearch.action.bulk.BulkProcessor.add(BulkProcessor.
 java:252)
 at org.elasticsearch.bulk.udp.BulkUdpService$Handler.
 messageReceived(BulkUdpService.java:181)
 at org.elasticsearch.common.netty.channel.
 SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.
 java:70)
 at org.elasticsearch.common.netty.channel.DefaultChannelPipeline.
 sendUpstream(DefaultChannelPipeline.java:564)
 at org.elasticsearch.common.netty.channel.DefaultChannelPipeline.
 sendUpstream(DefaultChannelPipeline.java:559)
 at org.elasticsearch.common.netty.channel.Channels.
 fireMessageReceived(Channels.java:268)
 at org.elasticsearch.common.netty.channel.socket.nio.
 NioDatagramWorker.read(NioDatagramWorker.java:98)
 at org.elasticsearch.common.netty.channel.socket.nio.
 AbstractNioWorker.process(AbstractNioWorker.java:108)
 at org.elasticsearch.common.netty.channel.socket.nio.
 AbstractNioSelector.run(AbstractNioSelector.java:318)
 at org.elasticsearch.common.netty.channel.socket.nio.
 AbstractNioWorker.run(AbstractNioWorker.java:89)
 at org.elasticsearch.common.netty.channel.socket.nio.
 NioDatagramWorker.run(NioDatagramWorker.java:343)
 at org.elasticsearch.common.netty.util.ThreadRenamingRunnable.run(
 ThreadRenamingRunnable.java:108)
 at org.elasticsearch.common.netty.util.internal.
 DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
 at java.util.concurrent.ThreadPoolExecutor.runWorker(
 ThreadPoolExecutor.java:1145)
 at java.util.concurrent.ThreadPoolExecutor$Worker.run(
 ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:745)

 I find it strange that things work using the usual bulk API, but not with
 the bulk UDP API.

 Am I overlooking something or doing something wrong?

 Thanks,
 Bart

 --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/6a676c4f-afd1-48a1-ab40-8c258aa3c54e%40googlegroups.com
 https://groups.google.com/d/msgid/elasticsearch/6a676c4f-afd1-48a1-ab40-8c258aa3c54e%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.


-- 
You received this message because you are subscribed to the Google Groups 

Does transport client do scatter gather?

2014-08-29 Thread John Smith
Just as the subject asks or only the node client can do scatter gather?

Thanks

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/b5274032-c142-46df-91e2-f451ab9c069e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Refactoring idea for buildShardFailures()?

2014-08-29 Thread Na Meng
Hi Sir/Madam,

I'm doing some research in automatic refactoring suggestion. By observing 
the co-change pattern of some similar code, we would like to develop a tool 
to suggest possible refactorings to apply in order to extract out common 
code while parameterizing any difference between them.

I have examined the code snippets in class 
org.elasticsearch.action.search.type.TransportSearchScrollScanAction.AsyncAction,
org.elasticsearch.action.search.type.TransportSearchScrollQueryAndFetchAction.AsyncAction,
 
and
org.elasticsearch.action.search.type.TransportSearchScrollQueryThenFetchAction.AsyncAction.

I notice that all of the three classes have method buildShardFailures() 
defined. The method bodies are pretty similar and they experience similar 
or same changes at least once in the version history. Do you think it is a 
good idea or bad idea to extract a method out of the methods?

No matter whether you would like to extract a method or not, would you like 
to share the factors in your mind which affect your decision, such as 
complexity of refactoring, poor readability, poor maintainability, etc.? For 
each factor, how do you think it can affect your decision about using 
refactoring? If possible, any quantative analysis will be great. For 
example, if the code size after refactoring is greater than that before 
refactoring, I won't do refactoring. Or if there are only two lines shared 
between two code snippets, I won't do refactoring, etc. 

Thanks a lot for your help! Your suggestion will be very valuable for our 
research.

Best regards,
Na Meng

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/24d9494b-c514-477b-8096-ae6dec8ca638%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: JVM crash on 64 bit SPARC with Elasticsearch 1.2.2 due to unaligned memory access

2014-08-29 Thread tony . aponte
Thanks again and sorry to bother you guys but I'm new to Github and don't 
know what do do from here.  Can you point me to the right place where I can 
take the next step to put this patch on my server?  I only know how to 
untar the tarball I downloaded from the main ES page.

Thanks.
Tony

On Wednesday, August 27, 2014 1:35:06 PM UTC-4, tony@iqor.com wrote:

 Kudos!

 Tony

 On Wednesday, August 27, 2014 1:16:11 PM UTC-4, Jörg Prante wrote:

 All praise should go to the fantastic Elasticsearch team who did not 
 hesitate to test the fix immediately and replaced it with a better working 
 solution, since the lzf-compress software is having weaknesses regarding 
 threadsafety.

 Jörg


 On Wed, Aug 27, 2014 at 7:01 PM, Ivan Brusic iv...@brusic.com wrote:

 Amazing job. Great work.

 -- 
 Ivan


 On Tue, Aug 26, 2014 at 12:41 PM, joerg...@gmail.com joerg...@gmail.com
  wrote:

 I fixed the issue by setting the safe LZF encoder in LZFCompressor and 
 opened a pull request 

 https://github.com/elasticsearch/elasticsearch/pull/7466

 Jörg


 On Tue, Aug 26, 2014 at 8:17 PM, joerg...@gmail.com joerg...@gmail.com
  wrote:

 Still broken with lzf-compress 1.0.3

 https://gist.github.com/jprante/d2d829b497db4963aea5

 Jörg


 On Tue, Aug 26, 2014 at 7:54 PM, joerg...@gmail.com 
 joerg...@gmail.com wrote:

 Thanks for the logstash mapping command. I can reproduce it now.

 It's the LZF encoder that bails out at 
 org.elasticsearch.common.compress.lzf.impl.UnsafeChunkEncoderBE._getInt

 which uses in turn sun.misc.Unsafe.getInt

 I have created a gist of the JVM crash file at 

 https://gist.github.com/jprante/79f4b4c0b9fd83eb1c9b
  
 There has been a fix in LZF lately 
 https://github.com/ning/compress/commit/db7f51bddc5b7beb47da77eeeab56882c650bff7

 for version 1.0.3 which has been released recently.

 I will build a snapshot ES version with LZF 1.0.3 and see if this 
 works...

 Jörg



 On Mon, Aug 25, 2014 at 11:30 PM, tony@iqor.com wrote:

 I captured a WireShark trace of the interaction between ES and 
 Logstash 1.4.1.  The error occurs even before my data is sent.  Can you 
 try 
 to reproduce it on your testbed with this message I captured?

 curl -XPUT http://amssc103-mgmt-app2:9200/_template/logstash -d @y

 Contests of file 'y:
 {  template : logstash-*,  settings : {   
  index.refresh_interval : 5s  },  mappings : {_default_ : { 
 
   _all : {enabled : true},   dynamic_templates : [ { 
 string_fields : {   match : *,   
 match_mapping_type 
 : string,   mapping : { type : string, 
 index 
 : analyzed, omit_norms : true,   fields : {   
 
   raw : {type: string, index : not_analyzed, ignore_above : 
 256}   }   } }   } ],   
 properties : 
 { @version: { type: string, index: not_analyzed },

   geoip  : {   type : object, dynamic: 
 true,   
   path: full, properties : {   
 location : { type : geo_point } } }   }   
  } 
  }}



 On Monday, August 25, 2014 3:53:18 PM UTC-4, tony@iqor.com 
 wrote:

 I have no plugins installed (yet) and only changed 
 es.logger.level to DEBUG in logging.yml. 

 elasticsearch.yml:
 cluster.name: es-AMS1Cluster
 node.name: KYLIE1
 node.rack: amssc2client02
 path.data: /export/home/apontet/elasticsearch/data
 path.work: /export/home/apontet/elasticsearch/work
 path.logs: /export/home/apontet/elasticsearch/logs
 network.host:    = sanitized line; file contains 
 actual server IP 
 discovery.zen.ping.multicast.enabled: false
 discovery.zen.ping.unicast.hosts: [s1, s2, s3, s5 , s6, 
 s7]   = Also sanitized

 Thanks,
 Tony




 On Saturday, August 23, 2014 6:29:40 AM UTC-4, Jörg Prante wrote:

 I tested a simple Hello World document on Elasticsearch 1.3.2 
 with Oracle JDK 1.7.0_17 64-bit Server VM, Sparc Solaris 10, default 
 settings.

 No issues.

 So I would like to know more about the settings in 
 elasticsearch.yml, the mappings, and the installed plugins.

 Jörg


 On Sat, Aug 23, 2014 at 11:25 AM, joerg...@gmail.com 
 joerg...@gmail.com wrote:

 I have some Solaris 10 Sparc V440/V445 servers available and can 
 try to reproduce over the weekend.

 Jörg


 On Sat, Aug 23, 2014 at 4:37 AM, Robert Muir 
 rober...@elasticsearch.com wrote:

 How big is it? Maybe i can have it anyway? I pulled two ancient 
 ultrasparcs out of my closet to try to debug your issue, but 
 unfortunately 
 they are a pita to work with (dead nvram battery on both, zeroed 
 mac 
 address, etc.) Id still love to get to the bottom of this.
  On Aug 22, 2014 3:59 PM, tony@iqor.com wrote:

 Hi Adrien,
 It's a bunch of garbled binary data, basically a dump of the 
 process image.
 Tony


 On Thursday, August 21, 2014 6:36:12 PM UTC-4, Adrien Grand 
 wrote:

 Hi Tony,

 Do you have more information in the core dump file? (cf. the 
 Core dump written line that 

Re: Stop words and Keyword tokenizer

2014-08-29 Thread Germán Carrillo
Thanks Ivan! I'll test which way fits better to my needs.



2014-08-28 17:12 GMT-05:00 Ivan Brusic i...@brusic.com:

 Character filters are executed before the tokenizer, so only something in
 that family of filters would work if you plan to continue using the keyword
 tokenizer.


 http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/analysis-mapping-charfilter.html

 The mapping char filter might be a better match if you list is not in
 regex form. I use the mapping char filter to remove copyright, trademark
 and a whole list of other characters from my content.

 Cheers,

 Ivan


 On Thu, Aug 28, 2014 at 2:33 PM, Germán Carrillo 
 carrillo.ger...@gmail.com wrote:

 Ivan, yes, I'm aware I would obtain another text, that's fine. Even more,
 my docs have a display field to be returned to users after a search. For
 the example given above, the display value would be something like:
 Mulaló, Yumbo, Valle del Cauca.

 Itamar, I've actually considered several options. I think a synonym file
 would be too big. I gave you 11 equivalent terms (you might've noticed I
 could have continued to give you around 30 equivalent ways), but I didn't
 mention place names (alone) have their corresponding synonyms, alternate
 names, abbreviations, and vernacular names. There could be 10k different
 places (docs) in the index. :D  Also, taking into account every single case
 into the synonym file seems to be sub-optimal. Really, I intend to
 normalize a large number of ways of expressing place hierarchy into a few
 ways. Otherwise I'd have to build very large lists for each place I add to
 the index, and nothing prevents I'm missing a weird case. BTW, handling
 hierarchy is a must, otherwise result disambiguation would be a nightmare
 for users.

 Thanks for all the discussion, it's certainly valuable to read an
 expert's opinion.

 Back to my very first question, is the pattern replace token filter the
 only way to remove stop words from tokens obtained from a keyword tokenizer?
 Are those regular expressions not very performant?


 2014-08-28 15:49 GMT-05:00 Ivan Brusic i...@brusic.com:

  You mentioned in your original post I'd like to obtain the original
 text without stop words

 The stopword-less phrase will indeed be present in the index after the
 analysis phrase, however, when you ask for this content back as a result of
 a query, the original text will be returned. What is indexed is not
 necessarily what is stored/returned.

 Cheers,

 Ivan


 On Thu, Aug 28, 2014 at 12:30 PM, Germán Carrillo 
 carrillo.ger...@gmail.com wrote:

 Thanks Ivan,

 do you mean what I obtain from a request such as

 curl -XGET
 'localhost:9200/_analyze?tokenizer=keywordfilters=lowercase,my_ascii_folding,my_stopwords'
 -d 'El corregimiento de Mulaló, jurisdicción del municipio de Yumbo
 (Valle del Cauca)'

 is not what will be present in the index after the analysis process? If
 so, how could I check whether the stop words filter is being (will be)
 applied to a sample phrase?


 2014-08-28 14:03 GMT-05:00 Ivan Brusic i...@brusic.com:

  Also note that the content returned will still contain the stop
 words. Only the inverted index will contain the stopword-less content.

 --
 Ivan


 On Thu, Aug 28, 2014 at 11:55 AM, Itamar Syn-Hershko 
 ita...@code972.com wrote:

 What would be the usecase for such a process (removing stop words
 without tokenization)?

 This may be a good read btw:
 http://www.elasticsearch.org/blog/stop-stopping-stop-words-a-look-at-common-terms-query/

 --

 Itamar Syn-Hershko
 http://code972.com | @synhershko https://twitter.com/synhershko
 Freelance Developer  Consultant
 Author of RavenDB in Action http://manning.com/synhershko/


 On Thu, Aug 28, 2014 at 9:48 PM, German Carrillo 
 carrillo.ger...@gmail.com wrote:

 Hi all,


 I'm looking for a way to remove stop words from tokens returned by a
 keyword tokenizer, i.e., I'd like to obtain the original text without 
 stop
 words after the analysis process.

 Sample data looks like: El corregimiento de
 Mulaló, jurisdicción del municipio de Yumbo (Valle del Cauca)
 After the lowercase token filter:   el corregimiento de
 mulaló, jurisdicción del municipio de yumbo (valle del cauca)
 After the ascii folding token filter:el corregimiento de
 mulalo, jurisdiccion del municipio de yumbo (valle del cauca)
 After removing stop words:   corregimiento mulalo,
 municipio yumbo (valle cauca)

 The stop words (currently) are:  [la, el, de, del,
 los, las, jurisdiccion]

 Is the pattern replace token filter the only (or best) way to go for
 such a task?

 I'd really like to avoid writing custom regular expressions rather
 than specifying a stop words list, which I know would work perfectly 
 fine
 for other tokenizers.


 Regards,

 Germán

 --
 You received this message because you are subscribed to the Google
 Groups elasticsearch group.
 To unsubscribe from this group and stop receiving emails from 

Re: JVM crash on 64 bit SPARC with Elasticsearch 1.2.2 due to unaligned memory access

2014-08-29 Thread joergpra...@gmail.com
Do you want to build from source? Or do you want to install a fresh binary?

At jenkins.elasticsearch.org I can not find any snapshot builds but it may
be just me.

It would be a nice add-on to provide snapshot builds for users that eagerly
await bug fixes or take a ride on the bleeding edge before the next release
arrives, without release notes etc.

Jörg


On Fri, Aug 29, 2014 at 4:29 PM, tony.apo...@iqor.com wrote:

 Thanks again and sorry to bother you guys but I'm new to Github and don't
 know what do do from here.  Can you point me to the right place where I can
 take the next step to put this patch on my server?  I only know how to
 untar the tarball I downloaded from the main ES page.

 Thanks.
 Tony


 On Wednesday, August 27, 2014 1:35:06 PM UTC-4, tony@iqor.com wrote:

 Kudos!

 Tony

 On Wednesday, August 27, 2014 1:16:11 PM UTC-4, Jörg Prante wrote:

 All praise should go to the fantastic Elasticsearch team who did not
 hesitate to test the fix immediately and replaced it with a better working
 solution, since the lzf-compress software is having weaknesses regarding
 threadsafety.

 Jörg


 On Wed, Aug 27, 2014 at 7:01 PM, Ivan Brusic iv...@brusic.com wrote:

 Amazing job. Great work.

 --
 Ivan


 On Tue, Aug 26, 2014 at 12:41 PM, joerg...@gmail.com 
 joerg...@gmail.com wrote:

 I fixed the issue by setting the safe LZF encoder in LZFCompressor and
 opened a pull request

 https://github.com/elasticsearch/elasticsearch/pull/7466

 Jörg


 On Tue, Aug 26, 2014 at 8:17 PM, joerg...@gmail.com 
 joerg...@gmail.com wrote:

 Still broken with lzf-compress 1.0.3

 https://gist.github.com/jprante/d2d829b497db4963aea5

 Jörg


 On Tue, Aug 26, 2014 at 7:54 PM, joerg...@gmail.com 
 joerg...@gmail.com wrote:

 Thanks for the logstash mapping command. I can reproduce it now.

 It's the LZF encoder that bails out at org.elasticsearch.common.
 compress.lzf.impl.UnsafeChunkEncoderBE._getInt

 which uses in turn sun.misc.Unsafe.getInt

 I have created a gist of the JVM crash file at

 https://gist.github.com/jprante/79f4b4c0b9fd83eb1c9b

 There has been a fix in LZF lately https://github.com/
 ning/compress/commit/db7f51bddc5b7beb47da77eeeab56882c650bff7

 for version 1.0.3 which has been released recently.

 I will build a snapshot ES version with LZF 1.0.3 and see if this
 works...

 Jörg



 On Mon, Aug 25, 2014 at 11:30 PM, tony@iqor.com wrote:

 I captured a WireShark trace of the interaction between ES and
 Logstash 1.4.1.  The error occurs even before my data is sent.  Can 
 you try
 to reproduce it on your testbed with this message I captured?

 curl -XPUT http://amssc103-mgmt-app2:9200/_template/logstash -d @y

 Contests of file 'y:
 {  template : logstash-*,  settings : {
  index.refresh_interval : 5s  },  mappings : {_default_ : {
   _all : {enabled : true},   dynamic_templates : [ {
 string_fields : {   match : *,   
 match_mapping_type
 : string,   mapping : { type : string, 
 index
 : analyzed, omit_norms : true,   fields : {
   raw : {type: string, index : not_analyzed, ignore_above :
 256}   }   } }   } ],   
 properties :
 { @version: { type: string, index: not_analyzed },
   geoip  : {   type : object, dynamic: 
 true,
   path: full, properties : {
 location : { type : geo_point } } }   }  
   }
  }}



 On Monday, August 25, 2014 3:53:18 PM UTC-4, tony@iqor.com
 wrote:

 I have no plugins installed (yet) and only changed
 es.logger.level to DEBUG in logging.yml.

 elasticsearch.yml:
 cluster.name: es-AMS1Cluster
 node.name: KYLIE1
 node.rack: amssc2client02
 path.data: /export/home/apontet/elasticsearch/data
 path.work: /export/home/apontet/elasticsearch/work
 path.logs: /export/home/apontet/elasticsearch/logs
 network.host:    = sanitized line; file contains
 actual server IP
 discovery.zen.ping.multicast.enabled: false
 discovery.zen.ping.unicast.hosts: [s1, s2, s3, s5 , s6,
 s7]   = Also sanitized

 Thanks,
 Tony




 On Saturday, August 23, 2014 6:29:40 AM UTC-4, Jörg Prante wrote:

 I tested a simple Hello World document on Elasticsearch 1.3.2
 with Oracle JDK 1.7.0_17 64-bit Server VM, Sparc Solaris 10, default
 settings.

 No issues.

 So I would like to know more about the settings in
 elasticsearch.yml, the mappings, and the installed plugins.

 Jörg


 On Sat, Aug 23, 2014 at 11:25 AM, joerg...@gmail.com 
 joerg...@gmail.com wrote:

 I have some Solaris 10 Sparc V440/V445 servers available and can
 try to reproduce over the weekend.

 Jörg


 On Sat, Aug 23, 2014 at 4:37 AM, Robert Muir 
 rober...@elasticsearch.com wrote:

 How big is it? Maybe i can have it anyway? I pulled two ancient
 ultrasparcs out of my closet to try to debug your issue, but 
 unfortunately
 they are a pita to work with (dead nvram battery on both, zeroed 
 mac
 address, etc.) Id still love to get to the bottom of 

Re: which class file trigger writing of segments.gen / segments_1

2014-08-29 Thread Jason Wee
Thanks Jörg,

read this link
https://lucene.apache.org/core/4_8_1/core/org/apache/lucene/index/SegmentInfos.html
, very informative.

Found a few spots that call the class SegmentInfos, below are them.

https://github.com/elasticsearch/elasticsearch/blob/v1.2.1/src/main/java/org/elasticsearch/index/gateway/local/LocalIndexShardGateway.java
https://github.com/elasticsearch/elasticsearch/blob/v1.2.1/src/main/java/org/elasticsearch/common/lucene/Lucene.java
https://github.com/elasticsearch/elasticsearch/blob/v1.2.1/src/main/java/org/elasticsearch/index/engine/internal/InternalEngine.java
https://github.com/elasticsearch/elasticsearch/blob/v1.2.1/src/main/java/org/elasticsearch/index/merge/policy/ElasticsearchMergePolicy.java
https://github.com/elasticsearch/elasticsearch/blob/v1.2.1/src/main/java/org/elasticsearch/index/snapshots/blobstore/BlobStoreIndexShardRepository.java

I understand that both segments file are written by Lucene but during
index, do you know which class in elasticsearch that eventually lead to
trigger the underlying writing of segments file?

/Jason


On Fri, Aug 29, 2014 at 8:49 PM, joergpra...@gmail.com 
joergpra...@gmail.com wrote:

 This is Lucene, when indexing starts. Look at the SegmentsInfo class
 https://lucene.apache.org/core/4_9_0/core/org/apache/lucene/index/SegmentInfos.html

 Jörg


 On Fri, Aug 29, 2014 at 2:38 PM, Jason Wee peich...@gmail.com wrote:

 Hello people,

 Anybody know which class/component in elastic search trigger writing
 of segments.gen and segments_1? I'm currently using elastic search version
 1.2.1. It would be great if you can provide link pin point which line in
 the class does that.

 Thank you.

 /Jason

 --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/bdfc3860-cb26-4f21-9597-4f500eb950e2%40googlegroups.com
 https://groups.google.com/d/msgid/elasticsearch/bdfc3860-cb26-4f21-9597-4f500eb950e2%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.


  --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/CAKdsXoG%3Ddauci86-7Y-RaN%2BJW94kqXU%3DwTA3kgxLO5Mj%3DLL0aQ%40mail.gmail.com
 https://groups.google.com/d/msgid/elasticsearch/CAKdsXoG%3Ddauci86-7Y-RaN%2BJW94kqXU%3DwTA3kgxLO5Mj%3DLL0aQ%40mail.gmail.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAHO4itx9BQfQN2f0JLwwUi4kKupohF1Otxh1_s31t40QeZbcPg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Not able to fulltext index Microsoft Office documents - PDF works fine

2014-08-29 Thread feenz
Hi David, 
I am currently using elasticsearch-1.3.1. Will the mapper-attchements-2.3.2
be compatible with my version of ES or will have have to update?

Thanks,

- Kyle



--
View this message in context: 
http://elasticsearch-users.115913.n3.nabble.com/Not-able-to-fulltext-index-Microsoft-Office-documents-PDF-works-fine-tp4062325p4062665.html
Sent from the ElasticSearch Users mailing list archive at Nabble.com.

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/1409321264321-4062665.post%40n3.nabble.com.
For more options, visit https://groups.google.com/d/optout.


Re: Not able to fulltext index Microsoft Office documents - PDF works fine

2014-08-29 Thread David Pilato
It will work with 1.3.1.
You should update to 1.3.2 though because we fixed some issues in this version.

--
David ;-)
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

 Le 29 août 2014 à 16:07, feenz kfeeney5...@gmail.com a écrit :
 
 Hi David, 
 I am currently using elasticsearch-1.3.1. Will the mapper-attchements-2.3.2
 be compatible with my version of ES or will have have to update?
 
 Thanks,
 
 - Kyle
 
 
 
 --
 View this message in context: 
 http://elasticsearch-users.115913.n3.nabble.com/Not-able-to-fulltext-index-Microsoft-Office-documents-PDF-works-fine-tp4062325p4062665.html
 Sent from the ElasticSearch Users mailing list archive at Nabble.com.
 
 -- 
 You received this message because you are subscribed to the Google Groups 
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an 
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit 
 https://groups.google.com/d/msgid/elasticsearch/1409321264321-4062665.post%40n3.nabble.com.
 For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/FBD8CEB7-43FE-46B5-928D-9E325B6B684F%40pilato.fr.
For more options, visit https://groups.google.com/d/optout.


Multi-field collapsing

2014-08-29 Thread Brian Hudson
I have a use case which requires collapsing on multiple fields.

As a simple example assume I have some movie documents indexed with the 
fields: Director, Actor, Title  Release Date. I want to be able to 
collapse on Director and Actor, getting the most recent movie (as indicated 
by Release Date).

I think the new top hits aggregation almost gets me mostly what I need. I 
can create a terms aggregation on Director, with a sub terms aggregation on 
Actor, and add a top hits aggregation to that (size 1). Would this be the 
proper approach? By traversing over the aggregations I can get all of the 
hits that I want - however I can't (have elasticsearch) sort or page them.

It's almost like I'd need a hitCollector aggregation which would collect 
all search hits generated by it's sub aggregations and allow me to specify 
sort and paging information at that level. Thoughts?

Brian

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/318b7474-004f-4244-90e8-d9b93639481f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Does transport client do scatter gather?

2014-08-29 Thread joergpra...@gmail.com
I'm not exactly sure what you mean by scatter-gather, but yes, both clients
can execute requests on all nodes of the cluster.

Jörg


On Fri, Aug 29, 2014 at 3:43 PM, John Smith java.dev@gmail.com wrote:

 Just as the subject asks or only the node client can do scatter gather?

 Thanks

 --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/b5274032-c142-46df-91e2-f451ab9c069e%40googlegroups.com
 https://groups.google.com/d/msgid/elasticsearch/b5274032-c142-46df-91e2-f451ab9c069e%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoHdeW329GZgvOm3NG0NAuNvEUJftkSMzKTyCAzM1%2B8bFg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: What the heck is this search?? :)

2014-08-29 Thread Chris Neal
Hi Boaz,

Thanks for the reply. :)  It's not a problem per-se.  I'm working through
performance/memory issues and turned on the slow log file and that one
popped up.  It's a problem because it's slow, but not causing cluster
stability issues!

It's interesting that you think it is Kibana though.  I removed the Head
plugin for 3 days and didn't see that query logged once, so I was pretty
sure it was the culprit!  Maybe it was just coincidence that whatever in
Kibana was doing it didn't happen then either.  Just my luck. ;)

Thanks again.
Chris


On Thu, Aug 28, 2014 at 3:48 PM, Boaz Leskes b.les...@gmail.com wrote:


 Hi Chris,

 This is actually Kibana. The reason it uses query_string is to allow
 people some kind of syntax in their query with no query parsing on the
 client side. Just a decision which I guess was made long ago to keep things
 simple.

 Is this a problem for you in any way?

 Cheers,
 Boaz

 On Thursday, August 21, 2014 6:37:02 PM UTC+2, Chris Neal wrote:

 Done.  Will report back.

 Thank you!



 On Thu, Aug 21, 2014 at 11:27 AM, Itamar Syn-Hershko ita...@code972.com
 wrote:

 I'm going to bet on Head. Disable it and see what happens.

 --

 Itamar Syn-Hershko
 http://code972.com | @synhershko https://twitter.com/synhershko
 Freelance Developer  Consultant
 Author of RavenDB in Action http://manning.com/synhershko/


 On Thu, Aug 21, 2014 at 7:22 PM, Chris Neal chris.n...@derbysoft.net
 wrote:

 Thanks guys for the thoughts.  Plugins didn't even occur to me, but
 they should have.

 We've got Marvel, Head, and ElasticHQ installed.

 Is there some way to tell where the search is coming from?  Something
 like an HTTP access log or something?

 Thanks again for your time!
 Chris


 On Wed, Aug 20, 2014 at 3:57 PM, Itamar Syn-Hershko ita...@code972.com
  wrote:

 I thought of Kibana because there's a faceting operation on the _type
 field. But I doubt neither Marvel nor Kibana would issue such an awful
 query (notice the fquery bit, too).

 Any part of your system (plugin or other) which might want to look at
 the types of documents added to an ES index?

 --

 Itamar Syn-Hershko
 http://code972.com | @synhershko https://twitter.com/synhershko
 Freelance Developer  Consultant
 Author of RavenDB in Action http://manning.com/synhershko/


 On Wed, Aug 20, 2014 at 11:53 PM, Ivan Brusic i...@brusic.com wrote:

 Very strange query indeed. Wildcard search filtered by a match_all.
 What?!?

 It is not Elasticsearch, but perhaps some plugin. Itamar mentioned
 Kibana, although you did not mention it in your post. Any other plugins?
 Marvel?

 --
 Ivan


 On Wed, Aug 20, 2014 at 12:43 PM, Itamar Syn-Hershko 
 ita...@code972.com wrote:

 There is no such thing as query internal to ES, if you see this in
 the logs you have a client making it. I would point to a Kibana instance
 but I'm pretty sure Kibana won't use a query_string query like this.

 And yes this is quite an expensive query (and facets) to run on a
 decent sized installation.

 --

 Itamar Syn-Hershko
 http://code972.com | @synhershko https://twitter.com/synhershko
 Freelance Developer  Consultant
 Author of RavenDB in Action http://manning.com/synhershko/


 On Wed, Aug 20, 2014 at 10:14 PM, Chris Neal 
 chris.n...@derbysoft.net wrote:

 Hi guys,

 I'm working through some performance concerns in my cluster, and I
 turned on the slow log feature.  I'm seeing this in the
 index_search_slowlog.log log:

 [2014-08-20 06:37:52,734][INFO ][index.search.slowlog.query]
 [elasticsearch-ip-10-0-0-41] [index-20140731][0] took[6s],
 took_millis[6081], types[], stats[], search_type[QUERY_TH
 EN_FETCH], total_shards[86], source[{facets:{terms:{
 terms:{field:_type,size:100,order:count,exclude
 :[]},facet_filter:{fquery:{query:{filtered:{query:
 {bool:{should:[{query_string:{query:*}}]}},
 filter:{bool:{must:[{match_all:{}}],size:0}],
 extra_source[],

 Is that a user generated search, or something internal to ES maybe?
  I can't even tell what it's trying to do.  It seems to hit every one 
 of my
 indexes though, as the same search query is logged 63 times in a one 
 minute
 period.

 Any ideas what this is?  Is it something to be concerned about?

 Thanks for the help!
 Chris

 --
 You received this message because you are subscribed to the Google
 Groups elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it,
 send an email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/
 CAND3Dpj7BzbaNva9B7JNFOeeaC9SrYWCEnvzTJgx2-AQeT478w%40mail.
 gmail.com
 https://groups.google.com/d/msgid/elasticsearch/CAND3Dpj7BzbaNva9B7JNFOeeaC9SrYWCEnvzTJgx2-AQeT478w%40mail.gmail.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.


  --
 You received this message because you are subscribed to the Google
 Groups elasticsearch group.
 To unsubscribe from this group and stop receiving emails from 

Re: JVM crash on 64 bit SPARC with Elasticsearch 1.2.2 due to unaligned memory access

2014-08-29 Thread Ivan Brusic
The snapshot repo is still active, but it is a bit behind and does not
include this patch:

https://oss.sonatype.org/content/repositories/snapshots/org/elasticsearch/elasticsearch/

-- 
Ivan


On Fri, Aug 29, 2014 at 8:21 AM, joergpra...@gmail.com 
joergpra...@gmail.com wrote:

 Do you want to build from source? Or do you want to install a fresh binary?

 At jenkins.elasticsearch.org I can not find any snapshot builds but it
 may be just me.

 It would be a nice add-on to provide snapshot builds for users that
 eagerly await bug fixes or take a ride on the bleeding edge before the next
 release arrives, without release notes etc.

 Jörg


 On Fri, Aug 29, 2014 at 4:29 PM, tony.apo...@iqor.com wrote:

 Thanks again and sorry to bother you guys but I'm new to Github and don't
 know what do do from here.  Can you point me to the right place where I can
 take the next step to put this patch on my server?  I only know how to
 untar the tarball I downloaded from the main ES page.

 Thanks.
 Tony


 On Wednesday, August 27, 2014 1:35:06 PM UTC-4, tony@iqor.com wrote:

 Kudos!

 Tony

 On Wednesday, August 27, 2014 1:16:11 PM UTC-4, Jörg Prante wrote:

 All praise should go to the fantastic Elasticsearch team who did not
 hesitate to test the fix immediately and replaced it with a better working
 solution, since the lzf-compress software is having weaknesses regarding
 threadsafety.

 Jörg


 On Wed, Aug 27, 2014 at 7:01 PM, Ivan Brusic iv...@brusic.com wrote:

 Amazing job. Great work.

 --
 Ivan


 On Tue, Aug 26, 2014 at 12:41 PM, joerg...@gmail.com 
 joerg...@gmail.com wrote:

 I fixed the issue by setting the safe LZF encoder in LZFCompressor
 and opened a pull request

 https://github.com/elasticsearch/elasticsearch/pull/7466

 Jörg


 On Tue, Aug 26, 2014 at 8:17 PM, joerg...@gmail.com 
 joerg...@gmail.com wrote:

 Still broken with lzf-compress 1.0.3

 https://gist.github.com/jprante/d2d829b497db4963aea5

 Jörg


 On Tue, Aug 26, 2014 at 7:54 PM, joerg...@gmail.com 
 joerg...@gmail.com wrote:

 Thanks for the logstash mapping command. I can reproduce it now.

 It's the LZF encoder that bails out at org.elasticsearch.common.
 compress.lzf.impl.UnsafeChunkEncoderBE._getInt

 which uses in turn sun.misc.Unsafe.getInt

 I have created a gist of the JVM crash file at

 https://gist.github.com/jprante/79f4b4c0b9fd83eb1c9b

 There has been a fix in LZF lately https://github.com/
 ning/compress/commit/db7f51bddc5b7beb47da77eeeab56882c650bff7

 for version 1.0.3 which has been released recently.

 I will build a snapshot ES version with LZF 1.0.3 and see if this
 works...

 Jörg



 On Mon, Aug 25, 2014 at 11:30 PM, tony@iqor.com wrote:

 I captured a WireShark trace of the interaction between ES and
 Logstash 1.4.1.  The error occurs even before my data is sent.  Can 
 you try
 to reproduce it on your testbed with this message I captured?

 curl -XPUT http://amssc103-mgmt-app2:9200/_template/logstash -d @y

 Contests of file 'y:
 {  template : logstash-*,  settings : {
  index.refresh_interval : 5s  },  mappings : {_default_ : 
 {
   _all : {enabled : true},   dynamic_templates : [ {
 string_fields : {   match : *,   
 match_mapping_type
 : string,   mapping : { type : string, 
 index
 : analyzed, omit_norms : true,   fields : {
   raw : {type: string, index : not_analyzed, ignore_above 
 :
 256}   }   } }   } ],   
 properties :
 { @version: { type: string, index: not_analyzed },
   geoip  : {   type : object, dynamic: 
 true,
   path: full, properties : {
 location : { type : geo_point } } }   } 
}
  }}



 On Monday, August 25, 2014 3:53:18 PM UTC-4, tony@iqor.com
 wrote:

 I have no plugins installed (yet) and only changed
 es.logger.level to DEBUG in logging.yml.

 elasticsearch.yml:
 cluster.name: es-AMS1Cluster
 node.name: KYLIE1
 node.rack: amssc2client02
 path.data: /export/home/apontet/elasticsearch/data
 path.work: /export/home/apontet/elasticsearch/work
 path.logs: /export/home/apontet/elasticsearch/logs
 network.host:    = sanitized line; file contains
 actual server IP
 discovery.zen.ping.multicast.enabled: false
 discovery.zen.ping.unicast.hosts: [s1, s2, s3, s5 ,
 s6, s7]   = Also sanitized

 Thanks,
  Tony




 On Saturday, August 23, 2014 6:29:40 AM UTC-4, Jörg Prante wrote:

 I tested a simple Hello World document on Elasticsearch 1.3.2
 with Oracle JDK 1.7.0_17 64-bit Server VM, Sparc Solaris 10, default
 settings.

 No issues.

 So I would like to know more about the settings in
 elasticsearch.yml, the mappings, and the installed plugins.

 Jörg


 On Sat, Aug 23, 2014 at 11:25 AM, joerg...@gmail.com 
 joerg...@gmail.com wrote:

 I have some Solaris 10 Sparc V440/V445 servers available and
 can try to reproduce over the weekend.

 Jörg


 On Sat, Aug 23, 2014 at 4:37 AM, Robert Muir 
 

Re: EL setup for fulltext search

2014-08-29 Thread Ivan Brusic
That output does not look like the something generated from the standard
analyzer since it contains uppercase letters and various non-word
characters such as '='.

Your two analysis requests will differ since the second one contains the
default word_delimiter filter instead of your custom my_word_delimiter.
What you are trying to achieve is somewhat difficult, but you can get there
if you keep on tweaking. :) Try using a pattern tokenizer instead of the
whitespace tokenizer if you want more control over word boundaries.

-- 
Ivan


On Fri, Aug 29, 2014 at 1:48 AM, Marc mn.off...@googlemail.com wrote:

 Hi Ivan,

 thanks again. I have tried so and found a reasonable combination.
 Nevertheless, when I now try to use the analyze api with an index that has
 the said analyzer defined via template it doesn't seem to apply:

 This is the complete template:
 {
 template: bogstash-*,
 settings: {
 index.number_of_replicas: 0,
 analysis: {
 analyzer: {
 msg_excp_analyzer: {
 type: custom,
 tokenizer: whitespace,
 filters: [word_delimiter,
 lowercase,
 asciifolding,
 shingle,
 standard]
 }
 },
 filters: {
 my_word_delimiter: {
 type: word_delimiter,
 preserve_original: true
 },
 my_asciifolding: {
 type: asciifolding,
 preserve_original: true
 }
 }
 }
 },
 mappings: {
 _default_: {
 properties: {
 @excp: {
 type: string,
 index: analyzed,
 analyzer: msg_excp_analyzer
 },
 @msg: {
 type: string,
 index: analyzed,
 analyzer: msg_excp_analyzer
 }
 }
 }
 }
 }
 I create the index bogstash-1.
 Now I test the following:
 curl -XGET
 'localhost:9200/bogstash-1/_analyze?analyzer=msg_excp_analyzerpretty=1' -d
 'Service=MyMDB.onMessage appId=cs Times=Me:22/Total:22 (updated
 attributes=gps_lng: 183731222/ gps_lat: 289309222/ )'
 and it returns:
 {
   tokens : [ {
 token : Service=MyMDB.onMessage,
 start_offset : 0,
 end_offset : 23,
 type : word,
 position : 1
   }, {
 token : appId=cs,
 start_offset : 24,
 end_offset : 32,
 type : word,
 position : 2
   }, {
 token : Times=Me:22/Total:22,
 start_offset : 33,
 end_offset : 53,
 type : word,
 position : 3
   }, {
 token : (updated,
 start_offset : 54,
 end_offset : 62,
 type : word,
 position : 4
   }, {
 token : attributes=gps_lng:,
 start_offset : 63,
 end_offset : 82,
 type : word,
 position : 5
   }, {
 token : 183731222/,
 start_offset : 83,
 end_offset : 93,
 type : word,
 position : 6
   }, {
 token : gps_lat:,
 start_offset : 94,
 end_offset : 102,
 type : word,
 position : 7
   }, {
 token : 289309222/,
 start_offset : 103,
 end_offset : 113,
 type : word,
 position : 8
   }, {
 token : ),
 start_offset : 114,
 end_offset : 115,
 type : word,
 position : 9
   } ]
 }
 Which is the output of a standard analyzer.
 Giving the tokenizer and filters in the analyze API directly works fine:
 curl -XGET
 'localhost:9200/_analyze?tokenizer=whitespacefilters=lowercase,word_delimiter,shingle,asciifolding,standardpretty=1'
 -d 'Service=MyMDB.onMessage appId=cs Times=Me:22/Total:22 (updated
 attributes=gps_lng: 183731222/ gps_lat: 289309222/ )'
 This results in:
 {
   tokens : [ {
 token : service,
 start_offset : 0,
 end_offset : 7,
 type : word,
 position : 1
   }, {
 token : service mymdb,
 start_offset : 0,
 end_offset : 13,
 type : shingle,
 position : 1
   }, {
 token : mymdb,
 start_offset : 8,
 end_offset : 13,
 type : word,
 position : 2
   }, {
 token : mymdb onmessage,
 start_offset : 8,
 end_offset : 23,
 type : shingle,
 position : 2
   }, {
 token : onmessage,
 start_offset : 14,
 end_offset : 23,
 type : word,
 position : 3
   }, {
 token : onmessage appid,
 start_offset : 14,
 end_offset : 29,
 type : shingle,
 position : 3
   }, {
 token : appid,
 start_offset : 24,
 end_offset : 29,
 type : word,
 position : 4
   }, {
 token : appid cs,
 start_offset : 24,
 end_offset : 32,
 type : shingle,
 position : 4
   }, {
 token : cs,
 start_offset : 30,
 end_offset : 32,
 type : word,
 position : 5
   }, {
 token : cs times,
 start_offset : 30,
 end_offset : 38,
 type : shingle,
 

How big can/should you scale Elasticsearch

2014-08-29 Thread Rob Blackin
We are trying to implement a 5 TB, 10 Billion item Elasticsearch cluster. 
The key is an integer and the item data is fairly small.

We seem to run into issues around loading. Seems to slow down as the index 
gets bigger.

We are doing this on EC2 i2.xlarge nodes.

How many documents/TB do you think we can load per node max?

So if we can do 2 Billion each then we need 5 nodes. We are trying to size 
it.  

Any advice is welcome. Even if it is that this is not a good thing to do :)

thanks

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/3faa4de9-0a27-49dc-8f68-ceebd5569da9%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: JVM crash on 64 bit SPARC with Elasticsearch 1.2.2 due to unaligned memory access

2014-08-29 Thread tony . aponte
The easiest for me is to install fresh binaries but I'm not shy about 
learning about Maven while I build it from source.  

Thanks
Tony

On Friday, August 29, 2014 11:21:34 AM UTC-4, Jörg Prante wrote:

 Do you want to build from source? Or do you want to install a fresh binary?

 At jenkins.elasticsearch.org I can not find any snapshot builds but it 
 may be just me.

 It would be a nice add-on to provide snapshot builds for users that 
 eagerly await bug fixes or take a ride on the bleeding edge before the next 
 release arrives, without release notes etc.

 Jörg


 On Fri, Aug 29, 2014 at 4:29 PM, tony@iqor.com javascript: wrote:

 Thanks again and sorry to bother you guys but I'm new to Github and don't 
 know what do do from here.  Can you point me to the right place where I can 
 take the next step to put this patch on my server?  I only know how to 
 untar the tarball I downloaded from the main ES page.

 Thanks.
 Tony


 On Wednesday, August 27, 2014 1:35:06 PM UTC-4, tony@iqor.com wrote:

 Kudos!

 Tony

 On Wednesday, August 27, 2014 1:16:11 PM UTC-4, Jörg Prante wrote:

 All praise should go to the fantastic Elasticsearch team who did not 
 hesitate to test the fix immediately and replaced it with a better working 
 solution, since the lzf-compress software is having weaknesses regarding 
 threadsafety.

 Jörg


 On Wed, Aug 27, 2014 at 7:01 PM, Ivan Brusic iv...@brusic.com wrote:

 Amazing job. Great work.

 -- 
 Ivan


 On Tue, Aug 26, 2014 at 12:41 PM, joerg...@gmail.com 
 joerg...@gmail.com wrote:

 I fixed the issue by setting the safe LZF encoder in LZFCompressor 
 and opened a pull request 

 https://github.com/elasticsearch/elasticsearch/pull/7466

 Jörg


 On Tue, Aug 26, 2014 at 8:17 PM, joerg...@gmail.com 
 joerg...@gmail.com wrote:

 Still broken with lzf-compress 1.0.3

 https://gist.github.com/jprante/d2d829b497db4963aea5

 Jörg


 On Tue, Aug 26, 2014 at 7:54 PM, joerg...@gmail.com 
 joerg...@gmail.com wrote:

 Thanks for the logstash mapping command. I can reproduce it now.

 It's the LZF encoder that bails out at org.elasticsearch.common.
 compress.lzf.impl.UnsafeChunkEncoderBE._getInt

 which uses in turn sun.misc.Unsafe.getInt

 I have created a gist of the JVM crash file at 

 https://gist.github.com/jprante/79f4b4c0b9fd83eb1c9b
  
 There has been a fix in LZF lately https://github.com/
 ning/compress/commit/db7f51bddc5b7beb47da77eeeab56882c650bff7

 for version 1.0.3 which has been released recently.

 I will build a snapshot ES version with LZF 1.0.3 and see if this 
 works...

 Jörg



 On Mon, Aug 25, 2014 at 11:30 PM, tony@iqor.com wrote:

 I captured a WireShark trace of the interaction between ES and 
 Logstash 1.4.1.  The error occurs even before my data is sent.  Can 
 you try 
 to reproduce it on your testbed with this message I captured?

 curl -XPUT http://amssc103-mgmt-app2:9200/_template/logstash -d @y

 Contests of file 'y:
 {  template : logstash-*,  settings : {   
  index.refresh_interval : 5s  },  mappings : {_default_ : 
 { 
   _all : {enabled : true},   dynamic_templates : [ {
  
 string_fields : {   match : *,   
 match_mapping_type 
 : string,   mapping : { type : string, 
 index 
 : analyzed, omit_norms : true,   fields : { 
   
   raw : {type: string, index : not_analyzed, ignore_above 
 : 
 256}   }   } }   } ],   
 properties : 
 { @version: { type: string, index: not_analyzed },  
  
   geoip  : {   type : object, dynamic: 
 true,   
   path: full, properties : {   
 location : { type : geo_point } } }   } 
} 
  }}



 On Monday, August 25, 2014 3:53:18 PM UTC-4, tony@iqor.com 
 wrote:

 I have no plugins installed (yet) and only changed 
 es.logger.level to DEBUG in logging.yml. 

 elasticsearch.yml:
 cluster.name: es-AMS1Cluster
 node.name: KYLIE1
 node.rack: amssc2client02
 path.data: /export/home/apontet/elasticsearch/data
 path.work: /export/home/apontet/elasticsearch/work
 path.logs: /export/home/apontet/elasticsearch/logs
 network.host:    = sanitized line; file contains 
 actual server IP 
 discovery.zen.ping.multicast.enabled: false
 discovery.zen.ping.unicast.hosts: [s1, s2, s3, s5 , 
 s6, s7]   = Also sanitized

 Thanks,
  Tony




 On Saturday, August 23, 2014 6:29:40 AM UTC-4, Jörg Prante wrote:

 I tested a simple Hello World document on Elasticsearch 1.3.2 
 with Oracle JDK 1.7.0_17 64-bit Server VM, Sparc Solaris 10, 
 default 
 settings.

 No issues.

 So I would like to know more about the settings in 
 elasticsearch.yml, the mappings, and the installed plugins.

 Jörg


 On Sat, Aug 23, 2014 at 11:25 AM, joerg...@gmail.com 
 joerg...@gmail.com wrote:

 I have some Solaris 10 Sparc V440/V445 servers available and 
 can try to reproduce over the weekend.

 Jörg


 On Sat, Aug 23, 2014 at 

Re: Explicitly Copying Replica Shards That Fail to Start

2014-08-29 Thread Ivan Brusic
I used to apply that trick all the time with older versions of
Elasticsearch!  Thankfully it has not occurred to me in years.

-- 
Ivan


On Thu, Aug 28, 2014 at 3:53 PM, Mark Walkom ma...@campaignmonitor.com
wrote:

 Yep, the easiest way is to drop the replica and then add it back and see
 how you go.

 Regards,
 Mark Walkom

 Infrastructure Engineer
 Campaign Monitor
 email: ma...@campaignmonitor.com
 web: www.campaignmonitor.com


 On 29 August 2014 08:40, David Kleiner david.klei...@gmail.com wrote:

 Greetings,

 I am still having a problem with recovery of 5 replica shards in 2
 indices of mine, 3-way cluster.  The replica shards fail to initialize and
 are jumping around two secondary nodes.  The primary shards are fine.

 What is my path to recovery?  Is copying master shard to secondary nodes
 a correct way?  I tried issuing routing commands to cancel
 recovery/allocation, it helped with some secondary shards but not with the
 5 in question.

 I also tried dumping index with failing secondary shards but two nodes
 crashed (well, lost connection to cluster) so dump failed.

 Would setting replica # to 0, copying masters to 2 nodes and setting
 replica #  to 1 a viable alternative?

 Thank you,

 David

 --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/8e7c4f11-2790-49d6-8c65-87e9aa05aa3b%40googlegroups.com
 https://groups.google.com/d/msgid/elasticsearch/8e7c4f11-2790-49d6-8c65-87e9aa05aa3b%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.


  --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/CAEM624ZpLMWPg95joA023WT3hS7AsS1x4%3DN4E5UUWuyt_LAWtg%40mail.gmail.com
 https://groups.google.com/d/msgid/elasticsearch/CAEM624ZpLMWPg95joA023WT3hS7AsS1x4%3DN4E5UUWuyt_LAWtg%40mail.gmail.com?utm_medium=emailutm_source=footer
 .

 For more options, visit https://groups.google.com/d/optout.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQCTZzDLj-roqZYV40sf8QrF7K_OB1oOAAVNv8N7m9zp-A%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Does transport client do scatter gather?

2014-08-29 Thread John Smith
According to this...

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-node.html

Non data nodes (I assume Node client is equivalent of a non data node) is
capable of scatter/gather searching. Was wondering if transport can do this
also?

2- Does transport support routing if you specify routing field? Or does it
always round robin regardless?
On Aug 29, 2014 12:09 PM, joergpra...@gmail.com joergpra...@gmail.com
wrote:

 I'm not exactly sure what you mean by scatter-gather, but yes, both
 clients can execute requests on all nodes of the cluster.

 Jörg


 On Fri, Aug 29, 2014 at 3:43 PM, John Smith java.dev@gmail.com
 wrote:

 Just as the subject asks or only the node client can do scatter gather?

 Thanks

 --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/b5274032-c142-46df-91e2-f451ab9c069e%40googlegroups.com
 https://groups.google.com/d/msgid/elasticsearch/b5274032-c142-46df-91e2-f451ab9c069e%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.


  --
 You received this message because you are subscribed to a topic in the
 Google Groups elasticsearch group.
 To unsubscribe from this topic, visit
 https://groups.google.com/d/topic/elasticsearch/70zTmEuyWHE/unsubscribe.
 To unsubscribe from this group and all its topics, send an email to
 elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/CAKdsXoHdeW329GZgvOm3NG0NAuNvEUJftkSMzKTyCAzM1%2B8bFg%40mail.gmail.com
 https://groups.google.com/d/msgid/elasticsearch/CAKdsXoHdeW329GZgvOm3NG0NAuNvEUJftkSMzKTyCAzM1%2B8bFg%40mail.gmail.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAMiEuFSuCrwaF6qoVf3-rsA_NjQKrJjFue62kjVvoiUH8A2rJA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: How big can/should you scale Elasticsearch

2014-08-29 Thread Arie
When you look to the guys @ found (https://www.found.no/pricing/) then the 
data on one ES server is 8 times memory,
if it should run smooth, but do not know how valuable that is. If you have 
a lot of ES nodes, then consider one master
node without data, it's a best practice I have read somewhere.

16GB Memory equals 128GB data.

On Friday, August 29, 2014 7:27:28 PM UTC+2, Rob Blackin wrote:

 We are trying to implement a 5 TB, 10 Billion item Elasticsearch cluster. 
 The key is an integer and the item data is fairly small.

 We seem to run into issues around loading. Seems to slow down as the index 
 gets bigger.

 We are doing this on EC2 i2.xlarge nodes.

 How many documents/TB do you think we can load per node max?

 So if we can do 2 Billion each then we need 5 nodes. We are trying to size 
 it.  

 Any advice is welcome. Even if it is that this is not a good thing to do :)

 thanks


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/c3e4601d-8564-47f6-b3b3-0fdb91fac96e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: JVM crash on 64 bit SPARC with Elasticsearch 1.2.2 due to unaligned memory access

2014-08-29 Thread joergpra...@gmail.com
Quick guide:

- install Java 7 (or Java 8), Apache Maven, and git, also ensure internet
connection to the Maven central repo

- clone 1.3 branch only (you could also clone the whole repo and switch to
the branch): git clone https://github.com/elasticsearch/elasticsearch.git
--branch 1.3 --single-branch es-1.3

- enter folder es-1.3

- start build: mvn -DskipTests clean install

- wait a few minutes while Maven loads all dependent artifacts and compiles
~3000 source files

The result will be a complete build of all binaries. In the 'target'
folder, after the Build complete message of Maven, you will see a file
elasticsearch-VERSION.jar

VERSION is something like 1.3.3-SNAPSHOT. You can copy this file into
your existing Elasticsearch 1.3.x installation lib folder. Do not forget
to adjust bin/elasticsearch.in.sh to point to the new
elasticsearch-VERSION.jar file in the classpath configuration (at the top
lines). This must be the first jar on the classpath so it can patch Lucene
jars.

If you have already data in the existing Elasticsearch I recommend to
backup everything before starting the new snapshot build - no guarantees,
use at your own risk.

Jörg




On Fri, Aug 29, 2014 at 7:36 PM, tony.apo...@iqor.com wrote:

 The easiest for me is to install fresh binaries but I'm not shy about
 learning about Maven while I build it from source.

 Thanks
 Tony


 On Friday, August 29, 2014 11:21:34 AM UTC-4, Jörg Prante wrote:

 Do you want to build from source? Or do you want to install a fresh
 binary?

 At jenkins.elasticsearch.org I can not find any snapshot builds but it
 may be just me.

 It would be a nice add-on to provide snapshot builds for users that
 eagerly await bug fixes or take a ride on the bleeding edge before the next
 release arrives, without release notes etc.

 Jörg


 On Fri, Aug 29, 2014 at 4:29 PM, tony@iqor.com wrote:

 Thanks again and sorry to bother you guys but I'm new to Github and
 don't know what do do from here.  Can you point me to the right place where
 I can take the next step to put this patch on my server?  I only know how
 to untar the tarball I downloaded from the main ES page.

 Thanks.
 Tony


 On Wednesday, August 27, 2014 1:35:06 PM UTC-4, tony@iqor.com wrote:

 Kudos!

 Tony

 On Wednesday, August 27, 2014 1:16:11 PM UTC-4, Jörg Prante wrote:

 All praise should go to the fantastic Elasticsearch team who did not
 hesitate to test the fix immediately and replaced it with a better working
 solution, since the lzf-compress software is having weaknesses regarding
 threadsafety.

 Jörg


 On Wed, Aug 27, 2014 at 7:01 PM, Ivan Brusic iv...@brusic.com wrote:

 Amazing job. Great work.

 --
 Ivan


 On Tue, Aug 26, 2014 at 12:41 PM, joerg...@gmail.com 
 joerg...@gmail.com wrote:

 I fixed the issue by setting the safe LZF encoder in LZFCompressor
 and opened a pull request

 https://github.com/elasticsearch/elasticsearch/pull/7466

 Jörg


 On Tue, Aug 26, 2014 at 8:17 PM, joerg...@gmail.com 
 joerg...@gmail.com wrote:

 Still broken with lzf-compress 1.0.3

 https://gist.github.com/jprante/d2d829b497db4963aea5

 Jörg


 On Tue, Aug 26, 2014 at 7:54 PM, joerg...@gmail.com 
 joerg...@gmail.com wrote:

 Thanks for the logstash mapping command. I can reproduce it now.

 It's the LZF encoder that bails out at org.elasticsearch.common.co
 mpress.lzf.impl.UnsafeChunkEncoderBE._getInt

 which uses in turn sun.misc.Unsafe.getInt

 I have created a gist of the JVM crash file at

 https://gist.github.com/jprante/79f4b4c0b9fd83eb1c9b

 There has been a fix in LZF lately https://github.com/ning
 /compress/commit/db7f51bddc5b7beb47da77eeeab56882c650bff7

 for version 1.0.3 which has been released recently.

 I will build a snapshot ES version with LZF 1.0.3 and see if this
 works...

 Jörg



 On Mon, Aug 25, 2014 at 11:30 PM, tony@iqor.com wrote:

 I captured a WireShark trace of the interaction between ES and
 Logstash 1.4.1.  The error occurs even before my data is sent.  Can 
 you try
 to reproduce it on your testbed with this message I captured?

 curl -XPUT http://amssc103-mgmt-app2:9200/_template/logstash -d
 @y

 Contests of file 'y:
 {  template : logstash-*,  settings : {
  index.refresh_interval : 5s  },  mappings : {_default_ 
 : {
   _all : {enabled : true},   dynamic_templates : [ {
 string_fields : {   match : *,   
 match_mapping_type
 : string,   mapping : { type : string, 
 index
 : analyzed, omit_norms : true,   fields : {
   raw : {type: string, index : not_analyzed, 
 ignore_above :
 256}   }   } }   } ],   
 properties :
 { @version: { type: string, index: not_analyzed },
   geoip  : {   type : object, dynamic: 
 true,
   path: full, properties : {
 location : { type : geo_point } } }   
 }}
  }}



 On Monday, August 25, 2014 3:53:18 PM UTC-4, tony@iqor.com
 wrote:

 I have no plugins 

Re: How big can/should you scale Elasticsearch

2014-08-29 Thread Nikolas Everett
On Fri, Aug 29, 2014 at 1:27 PM, Rob Blackin robblac...@gmail.com wrote:

 We are trying to implement a 5 TB, 10 Billion item Elasticsearch cluster.
 The key is an integer and the item data is fairly small.


We're running around 5.5TB right now without a problem.  The biggest
annoyance is that rolling restarts take time proportional to how much data
you have.

We have much larger documents then you have so we only store 181 million or
so.  Our documents are interactively maintained - a consistent portion of
them are updated daily with some creates and a few rare deletes.

You might want to think about how you do sharding - look into routing to
see if you can get away with oversubscribing on shards.  You might also
look into using multiple indexes as well.  Shay gave a talk on how you
could subdivide one large set of data into multiple indexes to help
things.  One 5TB index would be difficult to maintain.  As are any shards
that are more then, say, 20GB.  Just shuffling those shards from system to
system for rebalancing gets expensive.  Merges on those shards have a
higher upper bound on disk io and cache thrash.


 We seem to run into issues around loading. Seems to slow down as the index
 gets bigger.


Check on your merge rate.  This is old but it'll give you some idea of what
is going on:
http://blog.mikemccandless.com/2011/02/visualizing-lucenes-segment-merges.html
You can tune this a bit - especially if your data comes in spurts.



 We are doing this on EC2 i2.xlarge nodes.

 How many documents/TB do you think we can load per node max?

 So if we can do 2 Billion each then we need 5 nodes. We are trying to size
 it.


I can't talk to Amazon because we use physical machines.  We use 18
machines with two reasonably nice Intel ssds per machine, 96GB of ram, and
pretty sizeable CPUs and it isn't really enough to handle the query load we
want to throw at it.  I imagine the shape of your load is going to be of a
different though.

Nik

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAPmjWd1-59_4WQyKGFOsWBDmZd8iYu9agQPszwh80rB8g8vQ4Q%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Does transport client do scatter gather?

2014-08-29 Thread joergpra...@gmail.com
A node client is not just a non-data node although very close. The ES page
describes a proxy node scenario. Example: you have many HTTP clients and
they search for large result sets. This is often a challenge because of the
high resource contention. One or more data-less proxy nodes can help in
gathering these result sets, letting the data nodes alone, which just do
the scatter part of the search.

This is similar to how a TransportClient works for a JVM-only client.
TransportClient is also a proxy node that gathers result sets. But with
some subtle difference, you can not connect HTTP clients to a
TransportClient, and because the TransportClient is not a cluster member,
it uses the configured connected nodes as gather nodes within the cluster.
Because there are two gather nodes, this is called an extra hop in
comparison to a Java NodeClient. But, if you add the HTTP client request to
the request scenario mentioned before, there is no extra hop, only an extra
JVM. So the best place for TransportClient is on a remote host.

In Java, NodeClient and TransportClient share the full functionality of ES,
routing requests, round-robin for load balancing etc. For cluster-specific
server-only services like listening to cluster state, or snapshot/restore,
a TransportClient is not feasible, it can't do it or must ask a node in the
cluster for passing the information.

Jörg



On Fri, Aug 29, 2014 at 8:54 PM, John Smith java.dev@gmail.com wrote:

 According to this...


 http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-node.html

 Non data nodes (I assume Node client is equivalent of a non data node) is
 capable of scatter/gather searching. Was wondering if transport can do this
 also?

 2- Does transport support routing if you specify routing field? Or does it
 always round robin regardless?
 On Aug 29, 2014 12:09 PM, joergpra...@gmail.com joergpra...@gmail.com
 wrote:

 I'm not exactly sure what you mean by scatter-gather, but yes, both
 clients can execute requests on all nodes of the cluster.

 Jörg


 On Fri, Aug 29, 2014 at 3:43 PM, John Smith java.dev@gmail.com
 wrote:

 Just as the subject asks or only the node client can do scatter gather?

 Thanks

 --
 You received this message because you are subscribed to the Google
 Groups elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send
 an email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/b5274032-c142-46df-91e2-f451ab9c069e%40googlegroups.com
 https://groups.google.com/d/msgid/elasticsearch/b5274032-c142-46df-91e2-f451ab9c069e%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.


  --
 You received this message because you are subscribed to a topic in the
 Google Groups elasticsearch group.
 To unsubscribe from this topic, visit
 https://groups.google.com/d/topic/elasticsearch/70zTmEuyWHE/unsubscribe.
 To unsubscribe from this group and all its topics, send an email to
 elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/CAKdsXoHdeW329GZgvOm3NG0NAuNvEUJftkSMzKTyCAzM1%2B8bFg%40mail.gmail.com
 https://groups.google.com/d/msgid/elasticsearch/CAKdsXoHdeW329GZgvOm3NG0NAuNvEUJftkSMzKTyCAzM1%2B8bFg%40mail.gmail.com?utm_medium=emailutm_source=footer
 .

 For more options, visit https://groups.google.com/d/optout.

  --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/CAMiEuFSuCrwaF6qoVf3-rsA_NjQKrJjFue62kjVvoiUH8A2rJA%40mail.gmail.com
 https://groups.google.com/d/msgid/elasticsearch/CAMiEuFSuCrwaF6qoVf3-rsA_NjQKrJjFue62kjVvoiUH8A2rJA%40mail.gmail.com?utm_medium=emailutm_source=footer
 .

 For more options, visit https://groups.google.com/d/optout.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoG%3D-7Xwe6N%3DiOK3YiT-a9EmwOAbu4KqGM1xT1Yu_FHsbQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: JVM crash on 64 bit SPARC with Elasticsearch 1.2.2 due to unaligned memory access

2014-08-29 Thread tony . aponte
Thank you very much.
Tony

On Friday, August 29, 2014 3:27:33 PM UTC-4, Jörg Prante wrote:

 Quick guide:

 - install Java 7 (or Java 8), Apache Maven, and git, also ensure internet 
 connection to the Maven central repo 

 - clone 1.3 branch only (you could also clone the whole repo and switch to 
 the branch): git clone https://github.com/elasticsearch/elasticsearch.git 
 --branch 1.3 --single-branch es-1.3

 - enter folder es-1.3

 - start build: mvn -DskipTests clean install

 - wait a few minutes while Maven loads all dependent artifacts and 
 compiles ~3000 source files

 The result will be a complete build of all binaries. In the 'target' 
 folder, after the Build complete message of Maven, you will see a file 
 elasticsearch-VERSION.jar

 VERSION is something like 1.3.3-SNAPSHOT. You can copy this file into 
 your existing Elasticsearch 1.3.x installation lib folder. Do not forget 
 to adjust bin/elasticsearch.in.sh to point to the new 
 elasticsearch-VERSION.jar file in the classpath configuration (at the top 
 lines). This must be the first jar on the classpath so it can patch Lucene 
 jars.

 If you have already data in the existing Elasticsearch I recommend to 
 backup everything before starting the new snapshot build - no guarantees, 
 use at your own risk.

 Jörg




 On Fri, Aug 29, 2014 at 7:36 PM, tony@iqor.com javascript: wrote:

 The easiest for me is to install fresh binaries but I'm not shy about 
 learning about Maven while I build it from source.  

 Thanks
 Tony


 On Friday, August 29, 2014 11:21:34 AM UTC-4, Jörg Prante wrote:

 Do you want to build from source? Or do you want to install a fresh 
 binary?

 At jenkins.elasticsearch.org I can not find any snapshot builds but it 
 may be just me.

 It would be a nice add-on to provide snapshot builds for users that 
 eagerly await bug fixes or take a ride on the bleeding edge before the next 
 release arrives, without release notes etc.

 Jörg


 On Fri, Aug 29, 2014 at 4:29 PM, tony@iqor.com wrote:

 Thanks again and sorry to bother you guys but I'm new to Github and 
 don't know what do do from here.  Can you point me to the right place 
 where 
 I can take the next step to put this patch on my server?  I only know how 
 to untar the tarball I downloaded from the main ES page.

 Thanks.
 Tony


 On Wednesday, August 27, 2014 1:35:06 PM UTC-4, tony@iqor.com 
 wrote:

 Kudos!

 Tony

 On Wednesday, August 27, 2014 1:16:11 PM UTC-4, Jörg Prante wrote:

 All praise should go to the fantastic Elasticsearch team who did not 
 hesitate to test the fix immediately and replaced it with a better 
 working 
 solution, since the lzf-compress software is having weaknesses regarding 
 threadsafety.

 Jörg


 On Wed, Aug 27, 2014 at 7:01 PM, Ivan Brusic iv...@brusic.com 
 wrote:

 Amazing job. Great work.

 -- 
 Ivan


 On Tue, Aug 26, 2014 at 12:41 PM, joerg...@gmail.com 
 joerg...@gmail.com wrote:

 I fixed the issue by setting the safe LZF encoder in LZFCompressor 
 and opened a pull request 

 https://github.com/elasticsearch/elasticsearch/pull/7466

 Jörg


 On Tue, Aug 26, 2014 at 8:17 PM, joerg...@gmail.com 
 joerg...@gmail.com wrote:

 Still broken with lzf-compress 1.0.3

 https://gist.github.com/jprante/d2d829b497db4963aea5

 Jörg


 On Tue, Aug 26, 2014 at 7:54 PM, joerg...@gmail.com 
 joerg...@gmail.com wrote:

 Thanks for the logstash mapping command. I can reproduce it now.

 It's the LZF encoder that bails out at org.elasticsearch.common.
 compress.lzf.impl.UnsafeChunkEncoderBE._getInt

 which uses in turn sun.misc.Unsafe.getInt

 I have created a gist of the JVM crash file at 

 https://gist.github.com/jprante/79f4b4c0b9fd83eb1c9b
  
 There has been a fix in LZF lately https://github.com/ning
 /compress/commit/db7f51bddc5b7beb47da77eeeab56882c650bff7

 for version 1.0.3 which has been released recently.

 I will build a snapshot ES version with LZF 1.0.3 and see if this 
 works...

 Jörg



 On Mon, Aug 25, 2014 at 11:30 PM, tony@iqor.com wrote:

 I captured a WireShark trace of the interaction between ES and 
 Logstash 1.4.1.  The error occurs even before my data is sent.  Can 
 you try 
 to reproduce it on your testbed with this message I captured?

 curl -XPUT http://amssc103-mgmt-app2:9200/_template/logstash -d 
 @y

 Contests of file 'y:
 {  template : logstash-*,  settings : {   
  index.refresh_interval : 5s  },  mappings : {_default_ 
 : { 
   _all : {enabled : true},   dynamic_templates : [ {  

 string_fields : {   match : *,   
 match_mapping_type 
 : string,   mapping : { type : string, 
 index 
 : analyzed, omit_norms : true,   fields : {   
 
   raw : {type: string, index : not_analyzed, 
 ignore_above : 
 256}   }   } }   } ],   
 properties : 
 { @version: { type: string, index: not_analyzed 
 },   
   geoip  : {   type : object, dynamic: 
 true,   
   

Re: Replica assignement on the same host

2014-08-29 Thread Ivan Brusic
It's Friday. Can't read. Nevermind. :)


On Fri, Aug 29, 2014 at 5:06 PM, Mark Walkom ma...@campaignmonitor.com
wrote:

 He's running multiple ES instances/nodes per physical, ie a VM or
 container or just a second process, so I don't think it's primary and
 secondary on the same ES instance.

 Regards,
 Mark Walkom

 Infrastructure Engineer
 Campaign Monitor
 email: ma...@campaignmonitor.com
 web: www.campaignmonitor.com


 On 30 August 2014 05:16, Ivan Brusic i...@brusic.com wrote:

 The replica of a shard should never be on the same node as the primary.
 Where did you notice this anomaly? What version are you using?

 --
 Ivan


 On Fri, Aug 29, 2014 at 3:52 AM, Mark Walkom ma...@campaignmonitor.com
 wrote:

 That's the best method as per
 http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-cluster.html#allocation-awareness

 Regards,
 Mark Walkom

 Infrastructure Engineer
 Campaign Monitor
 email: ma...@campaignmonitor.com
 web: www.campaignmonitor.com


 On 29 August 2014 20:45, 'Nicolas Fraison' via elasticsearch 
 elasticsearch@googlegroups.com wrote:

 Hi,

 I have an ES cluster with 12 data nodes spread on 6 servers (so 2 nodes
 per server) and I saw that replicas of a shard can be allocated on the same
 server(on each nodes hosted by a server)

 To avoid this I haveset those parameters to the cluster:
 node.host: server_name
 cluster.routing.allocation.awareness.attributes: zone, host

 But I'm wondering if there are not a specific parameter for this
 instead of using clustering awareness allocation?

 Nicolas

 --
 You received this message because you are subscribed to the Google
 Groups elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send
 an email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/4d3924ab-77e4-49ef-9039-52df801ff46d%40googlegroups.com
 https://groups.google.com/d/msgid/elasticsearch/4d3924ab-77e4-49ef-9039-52df801ff46d%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.


  --
 You received this message because you are subscribed to the Google
 Groups elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send
 an email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/CAEM624YaxHLT%3DsptzqcSQw3i9u9oozO_2DstFJ6vCs-VC_bzOw%40mail.gmail.com
 https://groups.google.com/d/msgid/elasticsearch/CAEM624YaxHLT%3DsptzqcSQw3i9u9oozO_2DstFJ6vCs-VC_bzOw%40mail.gmail.com?utm_medium=emailutm_source=footer
 .

 For more options, visit https://groups.google.com/d/optout.


  --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQCFgA6FprUZ%2BZoYsB47N4f28pXSP4%2BGfdkRn_3L%3D_tXow%40mail.gmail.com
 https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQCFgA6FprUZ%2BZoYsB47N4f28pXSP4%2BGfdkRn_3L%3D_tXow%40mail.gmail.com?utm_medium=emailutm_source=footer
 .

 For more options, visit https://groups.google.com/d/optout.


  --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/CAEM624bMkjzG%3Db%3DaQdq0pd6khSsVi3BDgVa2AcChimiuxtneKA%40mail.gmail.com
 https://groups.google.com/d/msgid/elasticsearch/CAEM624bMkjzG%3Db%3DaQdq0pd6khSsVi3BDgVa2AcChimiuxtneKA%40mail.gmail.com?utm_medium=emailutm_source=footer
 .

 For more options, visit https://groups.google.com/d/optout.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQA_uyaQGXdKQt8hZr1X-qq_%2Bm3Rh%2BCpG4Fg_wRqYhdN0Q%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.