date:20140321

IIRC it does exist. Have a look at dashboard settings (don't remember the exact
tab name though).

By default, gist and file import/export are disabled.

--
David ;-)
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 22 mars 2014 à 03:00, Matt a écrit :

Dashboards can be exported to JSON, and are obviously savable to the
'kibana-int' index as JSON. Question is, how do you upload them?

We're doing some work with dynamic fields in the dashboards and it seems that
its impossible to create them from within the UI ... so you have to do it
locally on your computer and then publish them to Kibana. I swear in 3.0.0
there used to be a simple method for loading them from a GIST or something, but
now we can't find it. Help?

--Matt
--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/5074e8d0-04a3-49b3-9f3e-07a7a7b22da3%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/31CC4C15-9754-47BA-BE39-898AE0F8B41D%40pilato.fr.
For more options, visit https://groups.google.com/d/optout.

Re: Elastic search indexing documents

How can we help you?
You did not send what you are doing actually.

--
David ;-)
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs


Le 22 mars 2014 à 00:22, Deepikaa Subramaniam  a 
écrit :

Hi guys, 

I am new to Elastic Search. Have setup my env use C# +Nest to access ES. I am 
able to index txt files successfully. I downloaded the elastic search mapper 
plugin to extract data from other document types. However, if i try to search 
for some Keywords from within the doc the search doesn't return any results. 
Please help.

public class Doc
{
public string file_id;
public string created;
[ElasticProperty(Type=Nest.FieldType.attachment, Store = true, 
TermVector=Nest.termVectorOption.with_positions_offsets)]
public string content;
}
Doc doc = new Doc();
Doc.content = Convert.ToBase64String(File.ReadAllBytes(path));
-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/d8e90c02-09e2-4579-8091-d7521c27fc8b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/3581F728-E888-4230-9EE0-033DCF2DAAAF%40pilato.fr.
For more options, visit https://groups.google.com/d/optout.

Re: Sporadic NodeDisconnectedException and NoNodeAvailableException Failures

You should update your version to latest 0.90.x version or 1.0.1 although it 
probably won't solve your "network" issue.

I suppose you don't have anything in nodes logs?
How much HEAP did you give to your nodes?

--
David ;-)
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs


Le 21 mars 2014 à 22:26, Matt Greenfield  a 
écrit :

Hi,

We have been seeing sporadic NodeDisconnectedException and 
NoNodeAvailableException in our ES cluster (0.90.7).

Our cluster is made up of 2 data nodes. One data node has a single primary 
shard and one data node has a single replica shard. We connect to using the 
Java TransportClient configured with both hosts. 

We're able to connect and index and query 98% of the time. I have played around 
with client.transport.ping_timeout and that seems to address our 
NoNodeAvailableException.

However, we haven't been able to figure out the NodeDisconnectedException.

> 2014-03-21 12:53:18,322 DEBUG [ I/O worker #9}] [APP] 
> [.elasticsearch.transport.netty] [Pisces] disconnected from 
> [[#transport#-1][inet[prod-elasticsearch.domain/127.0.0.1:9300]]], channel 
> closed event
> 2014-03-21 12:53:22,402 DEBUG [[generic][T#57]] [APP] 
> [.elasticsearch.transport.netty] [Pisces] connected to node 
> [[#transport#-1][inet[prod-elasticsearch.domain/127.0.0.1:9300]]]


which is then immediately followed by:

> Caused by: org.elasticsearch.transport.NodeDisconnectedException: 
> [][inet[prod-elasticsearch.domain/127.0.0.1:9300]][index] disconnected

These logs are all generated on the client side and there is nothing that 
sticks out in the logs on either of the nodes.

I've seen in other posts that there might be network issues or that there might 
not be enough resources (cpu and/or memory). 

Does anyone have experience with these errors or know where I might want to be 
looking?
-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/4494bbe9-e3f8-4069-a093-daa103e8f980%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/DA454BFE-5A5E-4061-BCF7-87215E820923%40pilato.fr.
For more options, visit https://groups.google.com/d/optout.

Re: Node Exception with Transport Client

You need to update the client to es 1.0.0

--
David ;-)
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs


Le 21 mars 2014 à 22:22, IronMan2014  a écrit :

I posted earlier with an issue that I spent quite a bit of time trying to 
figure out how to setup EC2 cluster, I followed documentation, but still 
couldn't get it to work, I have 2 "i.4xlarge instances"

So, I decided to sale down the problem and work with one instance: 
I installed the latest 1.0.0 on EC2 instance and "aws plugin 2.0", the instance 
works fine from http API but not from my Java/API transport client.

Note, the transport client works fine against my local ES server (0.90.9), so 
nothing wrong with the Java client.

//Settings
Settings settings = ImmutableSettings.settingsBuilder()

.put("client.transport.sniff", true)

.put("refresh_interval", "-1")  //Needs to be updated after indexing is done

.put("number_of_shards", 1)

.put("number_of_replicas", "0")

.put("index.merge.policy.merge_factor", 30)

.put("client.transport.ping_timeout", "10s")  //Node conn timeout





But when I use My JavaAPI/Transport Client against Version 1.0.0 on the 
instance, I get this error:

WARN - Log4jESLogger.internalWarn(124) | [The Stepford Cuckoos] Message not 
fully read (response) for [0] handler 
future(org.elasticsearch.client.transport.TransportClientNodesService$SimpleNodeSampler$1@2d35da43),
 error [true], resettingException in thread "main" 
org.elasticsearch.client.transport.NoNodeAvailableException: No node available





-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/d0ecf9ca-619d-4127-b534-01147d829a79%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/AFCE8F63-2647-4D02-B317-C698BCA49692%40pilato.fr.
For more options, visit https://groups.google.com/d/optout.

Re: Search JAVA search API not working

client.prepareCount()

--
David ;-)
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs


Le 21 mars 2014 à 22:34, Mohit Anchlia  a écrit :

Thanks! refresh was the issue. Is there a easy way to find how many documents 
are in index using Java API?


> On Fri, Mar 21, 2014 at 2:24 PM, Mohit Anchlia  wrote:
> oh, I forgot refresh. Let me try that. By default it's 1 sec right?
> 
> 
>> On Fri, Mar 21, 2014 at 2:18 PM, David Pilato  wrote:
>> Are you just searching in your code or indexing as well?
>> Could it be caused because you did not refresh before searching?
>> 
>> --
>> David ;-)
>> Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs
>> 
>> 
>> Le 21 mars 2014 à 22:13, Mohit Anchlia  a écrit :
>> 
>> The below code doesn't seem to match the document for some reason. Same 
>> query when run directly using REST API works. Am I doing something wrong in 
>> the code?
>> 
>> queryString="fields.field1:value1";
>> 
>>  searchResponse = client.prepareSearch()
>>  
>> .setQuery(QueryBuilders.queryString(queryString)).execute()
>>  .actionGet();
>> 
>> Doc that I expect to match
>> '{
>>   "namespace": "ns1",
>>   "id": "3",
>>   "fastId": "f1",
>>   "version": 1393443298027,
>>   "lang": "en",
>>   "opType": "create",
>>   "fields": {
>> "field2": "value2",
>> "field1": "value1"
>>   },
>>   "arrayFields": {}
>> }'
>> 
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to elasticsearch+unsubscr...@googlegroups.com.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/CAOT3TWoSGTpNQpHBUUNpa0qQr0X8rm7pZXoqxYqcb-349xkrmA%40mail.gmail.com.
>> For more options, visit https://groups.google.com/d/optout.
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to elasticsearch+unsubscr...@googlegroups.com.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/0772CCDA-1515-4B40-8808-FA701E358327%40pilato.fr.
>> For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAOT3TWoev84CMWigQJMkBEaW_KsGpYc%2BWCA9qaHPQxXUt2sr8A%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/BC821437-7504-46BB-9F24-1B0DC42F086D%40pilato.fr.
For more options, visit https://groups.google.com/d/optout.

Kibana 3.0.0 milestone 5: How to upload dashboards via the UI? API?

2014-03-21 Thread Matt

Dashboards can be exported to JSON, and are obviously savable to the 
'kibana-int' index as JSON. Question is, how do you upload them?

We're doing some work with dynamic fields in the dashboards and it seems 
that its impossible to create them from within the UI ... so you have to do 
it locally on your computer and then publish them to Kibana. I swear in 
3.0.0 there used to be a simple method for loading them from a GIST or 
something, but now we can't find it. Help?

--Matt

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/5074e8d0-04a3-49b3-9f3e-07a7a7b22da3%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Replica shard stuck at initializing after client and data node restart

What version are you running?

It's odd this would happen if, when you set replica's to zero, the cluster
state is green and your index is ok.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com


On 22 March 2014 06:15, Glenn Snead  wrote:

> I have a six node cluster: 2 master nodes and 4 client / data nodes.  I
> have two indicies.  One with data and one that is set aside for future
> use.  I'm having trouble with the indicie that is in use.
> After making some limits.conf configuraiton changes and restarting the
> impacted nodes, one of my indicies' replica shard will not complete
> initialization.
> I wasn't in charge of the node restarts and here is the sequence of events:
> Shut down the client and data nodes on each of the four servers.
> Start the client and data node on each server.
> I don't believe time was allowed to allow the cluster to reallocate or to
> move shards.
>
> limits.conf changes:
> - memlock unlimited
> hard nofiles 32000
> soft  nofiles  32000
>
> Here's what I have tried thus far:
>
> Drop the replica shard, which brings the cluster status to Green.
> Verify the cluster's status - no replication, no realocating, etc.
> Re-add the replica shard.
>
> Drop the replica shard and the data nodes that were to carry the replica
> shard.
> Verify the cluster's status.
> Start the data nodes and allow the cluster to reallocate primary shards.
>  - The cluster's status is Green.
> Add the replica shard to the indicie.  The replica shard never completes
> initialization, even over a 24 hour period.
>
> I've checked the transaction log files on each node and they are all zero
> legnth files.
> The replica shard holding nodes are primary shards for the unused indicie.
> These nodes copied it's matching primary node's index Size (as seen in
> paramedic), but now Paramedic shows an index Size of only a few bytes.  The
> index folder on the replica shard servers still has the data.
>
> Unknown to me, my target system was put online and my leadership doesn't
> want to schedule an outage window.  Most my reasearch suggests that I drop
> the impacted indicie and re-initialize.  I can replace the data, but this
> would impact the user interface while the indicie re-ingests the
> documents.  This issue has occured before on my test system and the fix was
> to rebuild the index.  However I never learned why the replica shard had
> the issue in the first place.
>
> My questions are:
> - Does the replica shard hosting server's index Size (shown in paramedic)
> indciate a course of action?
> - Is it possible to resolve this without dropping the indicie and
> rebuilding?  I'd hate to resort to this each time we attempt ES server
> maintenance or configuration changes.
>
>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/a868b3da-fd28-49b4-bc8f-2f60f2c34ec7%40googlegroups.com
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAEM624ZsMpqnF16T_3-ZzDy7SjcsFouaDOBQQEEATby%2B7Lorzg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: Optimal number of Shards per node

ElasticHQ, Marvel, bigdesk and kopf are some of the better monitoring
plugins.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com


On 22 March 2014 03:56, Rajan Bhatt  wrote:

> Thanks Zack.
>
> So on single node this test will tell us how much single node with Single
> shard can get us. Now if we want to deploy more shards/per node then we
> need take into consideration, that more shard/per node would consume more
> resources ( File Descriptor, Memory, etc..) and performance would degrade
> as more shards are added to node.
>
> This is tricky and milage can vary with different work load ( Indexing +
> Searching ) ..
>
> I am not sure you would be able to describe at very high level your
> deployment ( Number of ES nodes + number of Index + Shards + Replica ) to
> get some idea..
> I appreciate your answer and your time.
>
> btw,which tool you use for monitoring ES cluster and what you monitor ?
> Thanks
> Rajan
>
> On Thursday, March 20, 2014 2:05:52 PM UTC-7, Zachary Tong wrote:
>>
>> Unfortunately, there is no way that we can tell you an optimal number.
>>  But there is a way that you can perform some capacity tests, and arrive at
>> usable numbers that you can extrapolate from.  The process is very simple:
>>
>>
>>- Create a single index, with a single shard, on a single
>>production-style machine
>>- Start indexing *real, production-style *data.  "Fake" or "dummy"
>>data won't work here, it needs to mimic real-world data
>>- Periodically, run real-world queries that you would expect users to
>>enter
>>- At some point, you'll find that performance is no longer acceptable
>>to you.  Perhaps the indexing rate becomes too slow.  Or perhaps query
>>latency is too slow.  Or perhaps your node just runs out of memory
>>- Write down the number of documents in the shard, and the physical
>>size of the shard
>>
>> Now you know the limit of a single shard given your hardware + queries +
>> data.  Using that knowledge, you can extrapolate given your expected
>> search/indexing load, and how many documents you expect to index over the
>> next few years, etc.
>>
>> -Zach
>>
>>
>>
>> On Thursday, March 20, 2014 3:29:47 PM UTC-5, Rajan Bhatt wrote:
>>>
>>> Hello,
>>>
>>> I would appreciate if someone can suggest optimal number of shards per
>>> ES node for optimal performance or any recommended way to arrive at number
>>> of shards given number of core and memory foot print.
>>>
>>> Thanks in advance
>>> Reagards
>>> Rajan
>>>
>>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/658c8f7d-071b-46c8-b80b-3d0660e7889e%40googlegroups.com
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAEM624ZY%3Ddfd9sUqPdFWcU%3DuSsjb9ny2rS_HwX3_j_cg0_d71w%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: BigDecimal support

2014-03-21 Thread mooky

Fixed. Pull request 
here: https://github.com/elasticsearch/elasticsearch/pull/5491

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/84755e75-13f2-4f80-894f-b88452a8de68%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Divine Retributions upon the kaafir

2014-03-21 Thread war house

Divine Retributions upon the kaafir

All the kaafirs of the entire world, the jews, christians, fire-worshippers
and mushriqs are bringing outrages on the Muslims at every turn in the
world here and there, in lanes and roads, killing and making Shaheed of
them, plundering assets belonging to Muslims, violating Muslim womenfolk
and degrading and humiliating them by calling them militants and
terrorists. Besides oppressions, they are hampering Muslims in observing
their Far'd-Wajib items, Sunnate Muaqqada and even the Shariah itself.

On behalf of the entire tormented Muslim class, Allah Pak to save the
Muslims, the way Allah saved Ka'aba Shareef from the tyrant hands of kaafir
Abrwaha and also to destroy the kaafir community, the way abrwaha was
destroyed. Owing to His Mubarwak appeals and supplications, Allah Ta'ala is
battering the kaafirs by showering different kinds of punishments and
retributions on them! The outcomes are the recent natural calamities like
severe earthquakes, tsunami, heavy snowfalls, cyclones and tornados, wild
fires of unusual dimensions in various countries of the kaafirs including
Europe-America along with the severe economic recessions!

Relevantly, "If the jews, christians, kaafirs and mushriqs don't stop
repressions on Muslims, they would turn to be street-beggars and shall eat
from the dustbins. At one stage, they won't find food from dustbin even
while they would fight with dogs for foods from there. And yet if they
don't stop Muslim repressions, then they would be rolling on the streets
not finding places (accommodations) to live. At one stage, shall roll down
to the waters not finding lands to live on. They shall sink in there!

Analyzing the available information and data, it is experienced that the
time onward, began such supplications and making such predictions,
Europe-America and the rest of the tyrant world of the kaafirs are being
attacked in series by heavenly outrages like economic recessions and
various natural disasters! .

Something quite unusual; blasts dont generally occur in TV sets; brand name
of this tv set needs to be mentioned in the next news report on this
unfortunate happening. There was a report a few days back about smokes
coming from a particular version of Sony tv sets and the company having
decided to recall lakhs of such tv sets. Blasts from mobile sets have also
taken place at some places in the country, mostly from fraudulently
imported and sold mobile phones from china which do not carry MEI Numbers.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CA%2BkKsm6UMgX_n%2ByJwBYLBcrZ9EsXv7iZJw2VHohr4%3D-4Jpi32w%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Elastic search indexing documents

2014-03-21 Thread Deepikaa Subramaniam

Hi guys, 

I am new to Elastic Search. Have setup my env use C# +Nest to access ES. I 
am able to index txt files successfully. I downloaded the elastic search 
mapper plugin to extract data from other document types. However, if i try 
to search for some Keywords from within the doc the search doesn't return 
any results. Please help.

public class Doc
{
public string file_id;
public string created;
[ElasticProperty(Type=Nest.FieldType.attachment, Store = true, 
TermVector=Nest.termVectorOption.with_positions_offsets)]
public string content;
}
Doc doc = new Doc();
Doc.content = Convert.ToBase64String(File.ReadAllBytes(path));

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/d8e90c02-09e2-4579-8091-d7521c27fc8b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Please explain the flow of data?

Yes you can leverage a master to be a search node in that way.

We have a 15 node cluster with 3 masters, I'm thinking I'll add another 2
when we add a few more data nodes in the next few weeks. Essentially you
want an uneven number of masters to ensure a quorum is reached. But when
you start getting large clusters, ie tens of nodes, it doesn't make as much
sense to have n/2+1 masters.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com


On 22 March 2014 09:36, Josh Harrison  wrote:

> Awesome, ok, thank you.
> Is the logic behind not allowing storage on master nodes to both:
> Take advantage of a system with limited storage resources
> and
> Have a dedicated results aggregator/search handler?
>
> I can imagine if I had a particularly badly written gnarly search, trying
> to deal with the results on a master and a querying the results at the same
> time could be bad.
>
> So in a 16 node cluster you'd want to have 9 nodes allowed to be masters,
> (n/2)+1?
>
> Thanks again!
> Josh
>
>
> On Friday, March 21, 2014 3:20:24 PM UTC-7, Mark Walkom wrote:
>>
>> A couple of things;
>>
>>1. You should have n/2+1 masters in your cluster, where n = number of
>>nodes. This helps prevent split brain situations and is best practise.
>>2. Your master nodes can store data, this way you don't need to add
>>more nodes to fulfil the above.
>>
>> Your indexing scenario is correct.
>> For searching, replica's and primaries can be queried.
>> For both - Adding more masters adds redundancy as per the first two
>> points. Adding more search nodes won't do much though other than reduce the
>> load on your masters (unless someone else can add anything I don't know :p).
>>
>> And for your final question, yes that is correct.
>>
>> To give you an idea of practical application, we don't use search nodes
>> but have 3 non-data masters that handle all queries, and a bunch of data
>> only nodes for storing everything.
>>
>> Regards,
>> Mark Walkom
>>
>> Infrastructure Engineer
>> Campaign Monitor
>> email: ma...@campaignmonitor.com
>> web: www.campaignmonitor.com
>>
>>
>> On 22 March 2014 08:25, Josh Harrison  wrote:
>>
>>> I'm trying to build a basic understanding of how indexing and searching
>>> works, hopefully someone can either point me to good resources or explain!
>>> I'm trying to figure out what having multiple "coordinator" nodes as
>>> defined in the elasticsearch.yml would do, and what having multiple "search
>>> load balancer" nodes would do. Both in the context of indexing and
>>> searching.
>>> Is there a functional difference between a "coordinator" node and a
>>> "search load balancer" node, beyond the fact that a "search load balancer"
>>> node can't be elected master?
>>>
>>>
>>> Say I have a 4 node cluster. There's a master only "coordinator" node,
>>> that doesn't store data, named "master".
>>> node.master: true
>>> node.data: false
>>>
>>> There are three data only nodes, "A", "B" and "C"
>>> node.master: false
>>> node.date: true
>>>
>>> I have an index "test" with two shards and one replica. Primary shard 0
>>> lives on A, primary shard 1 lives on C, replica shard 0 lives on B, replica
>>> shard 1 lives on A.
>>>
>>> I send the command
>>> curl -XPOST http://master:9200/test/test -d '{"foo":"bar"}'
>>>
>>> A connection is made to master, and the data is sent to master to be
>>> indexed. Master randomly decides to place this document in shard 1, so it
>>> gets sent to the primary shard 1 on C and replica shard 1 on B, right? This
>>> is where routing can come in, I can say that that document really should go
>>> to shard 0 because I said so.
>>>
>>> So this is a fairly simple scenario, assuming I'm correct.
>>>
>>> What benefit do I get to indexing when I add more "coordinator" nodes?
>>> node.master: true
>>> node.data: false
>>>
>>> What about if I add "search load balancer" nodes?
>>> node.master: false
>>> node.data: false
>>>
>>>
>>>
>>> How about on the searching side of things?
>>> I send a search to master,
>>> curl -XPOST http://master:9200/test/test/_search -d
>>> '{"query":{"match_all":{}}}'
>>>
>>> Master sends these queries off to A, B and C, who each generate their
>>> own results and return them to master. Each data node queries all the
>>> relevant shards that are present locally and then combines those results
>>> for delivery to master. Do only primary shards get queried, or are replica
>>> shards queried too?
>>> Master takes these combined results from all the relevant nodes and
>>> combines them into the final query response.
>>>
>>> Same questions:
>>> What benefit do I get to searching when I add more nodes that are like
>>> master?
>>> node.master: true
>>> node.data: false
>>>
>>> What about if I add "search load balancer" nodes?
>>> node.master: false
>>> node.data: false
>>>
>>>
>>> Is the only difference between a
>>> node.master: true
>>> node.data: false
>>> and a
>>> node.master: false
>>>  node

Re: Please explain the flow of data?

2014-03-21 Thread Josh Harrison

Awesome, ok, thank you.
Is the logic behind not allowing storage on master nodes to both:
Take advantage of a system with limited storage resources
and
Have a dedicated results aggregator/search handler?

I can imagine if I had a particularly badly written gnarly search, trying 
to deal with the results on a master and a querying the results at the same 
time could be bad.

So in a 16 node cluster you'd want to have 9 nodes allowed to be masters, 
(n/2)+1?

Thanks again!
Josh


On Friday, March 21, 2014 3:20:24 PM UTC-7, Mark Walkom wrote:
>
> A couple of things;
>
>1. You should have n/2+1 masters in your cluster, where n = number of 
>nodes. This helps prevent split brain situations and is best practise.
>2. Your master nodes can store data, this way you don't need to add 
>more nodes to fulfil the above. 
>
> Your indexing scenario is correct. 
> For searching, replica's and primaries can be queried.
> For both - Adding more masters adds redundancy as per the first two 
> points. Adding more search nodes won't do much though other than reduce the 
> load on your masters (unless someone else can add anything I don't know :p).
>
> And for your final question, yes that is correct.
>
> To give you an idea of practical application, we don't use search nodes 
> but have 3 non-data masters that handle all queries, and a bunch of data 
> only nodes for storing everything.
>
> Regards,
> Mark Walkom
>
> Infrastructure Engineer
> Campaign Monitor
> email: ma...@campaignmonitor.com 
> web: www.campaignmonitor.com
>
>
> On 22 March 2014 08:25, Josh Harrison >wrote:
>
>> I'm trying to build a basic understanding of how indexing and searching 
>> works, hopefully someone can either point me to good resources or explain!
>> I'm trying to figure out what having multiple "coordinator" nodes as 
>> defined in the elasticsearch.yml would do, and what having multiple "search 
>> load balancer" nodes would do. Both in the context of indexing and 
>> searching.
>> Is there a functional difference between a "coordinator" node and a 
>> "search load balancer" node, beyond the fact that a "search load balancer" 
>> node can't be elected master?
>>
>>
>> Say I have a 4 node cluster. There's a master only "coordinator" node, 
>> that doesn't store data, named "master". 
>> node.master: true
>> node.data: false
>>
>> There are three data only nodes, "A", "B" and "C" 
>> node.master: false
>> node.date: true
>>
>> I have an index "test" with two shards and one replica. Primary shard 0 
>> lives on A, primary shard 1 lives on C, replica shard 0 lives on B, replica 
>> shard 1 lives on A.
>>
>> I send the command
>> curl -XPOST http://master:9200/test/test -d '{"foo":"bar"}'
>>
>> A connection is made to master, and the data is sent to master to be 
>> indexed. Master randomly decides to place this document in shard 1, so it 
>> gets sent to the primary shard 1 on C and replica shard 1 on B, right? This 
>> is where routing can come in, I can say that that document really should go 
>> to shard 0 because I said so.
>>
>> So this is a fairly simple scenario, assuming I'm correct.
>>
>> What benefit do I get to indexing when I add more "coordinator" nodes?
>> node.master: true
>> node.data: false
>>
>> What about if I add "search load balancer" nodes?
>> node.master: false
>> node.data: false
>>
>>
>>
>> How about on the searching side of things?
>> I send a search to master,
>> curl -XPOST http://master:9200/test/test/_search -d 
>> '{"query":{"match_all":{}}}'
>>
>> Master sends these queries off to A, B and C, who each generate their own 
>> results and return them to master. Each data node queries all the relevant 
>> shards that are present locally and then combines those results for 
>> delivery to master. Do only primary shards get queried, or are replica 
>> shards queried too? 
>> Master takes these combined results from all the relevant nodes and 
>> combines them into the final query response.
>>
>> Same questions:
>> What benefit do I get to searching when I add more nodes that are like 
>> master?
>> node.master: true
>> node.data: false
>>
>> What about if I add "search load balancer" nodes?
>> node.master: false
>> node.data: false
>>  
>>
>> Is the only difference between a 
>> node.master: true
>> node.data: false
>> and a
>> node.master: false
>>  node.data: false
>> that the node is a candidate to be a master, should it be elected?
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to elasticsearc...@googlegroups.com .
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/eaff1d85-1e85-422d-bfba-9a0825ed5da9%40googlegroups.com
>> .
>> For more options, visit https://groups.go

Re: Please explain the flow of data?

A couple of things;

   1. You should have n/2+1 masters in your cluster, where n = number of
   nodes. This helps prevent split brain situations and is best practise.
   2. Your master nodes can store data, this way you don't need to add more
   nodes to fulfil the above.

Your indexing scenario is correct.
For searching, replica's and primaries can be queried.
For both - Adding more masters adds redundancy as per the first two points.
Adding more search nodes won't do much though other than reduce the load on
your masters (unless someone else can add anything I don't know :p).

And for your final question, yes that is correct.

To give you an idea of practical application, we don't use search nodes but
have 3 non-data masters that handle all queries, and a bunch of data only
nodes for storing everything.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com


On 22 March 2014 08:25, Josh Harrison  wrote:

> I'm trying to build a basic understanding of how indexing and searching
> works, hopefully someone can either point me to good resources or explain!
> I'm trying to figure out what having multiple "coordinator" nodes as
> defined in the elasticsearch.yml would do, and what having multiple "search
> load balancer" nodes would do. Both in the context of indexing and
> searching.
> Is there a functional difference between a "coordinator" node and a
> "search load balancer" node, beyond the fact that a "search load balancer"
> node can't be elected master?
>
>
> Say I have a 4 node cluster. There's a master only "coordinator" node,
> that doesn't store data, named "master".
> node.master: true
> node.data: false
>
> There are three data only nodes, "A", "B" and "C"
> node.master: false
> node.date: true
>
> I have an index "test" with two shards and one replica. Primary shard 0
> lives on A, primary shard 1 lives on C, replica shard 0 lives on B, replica
> shard 1 lives on A.
>
> I send the command
> curl -XPOST http://master:9200/test/test -d '{"foo":"bar"}'
>
> A connection is made to master, and the data is sent to master to be
> indexed. Master randomly decides to place this document in shard 1, so it
> gets sent to the primary shard 1 on C and replica shard 1 on B, right? This
> is where routing can come in, I can say that that document really should go
> to shard 0 because I said so.
>
> So this is a fairly simple scenario, assuming I'm correct.
>
> What benefit do I get to indexing when I add more "coordinator" nodes?
> node.master: true
> node.data: false
>
> What about if I add "search load balancer" nodes?
> node.master: false
> node.data: false
>
>
>
> How about on the searching side of things?
> I send a search to master,
> curl -XPOST http://master:9200/test/test/_search -d
> '{"query":{"match_all":{}}}'
>
> Master sends these queries off to A, B and C, who each generate their own
> results and return them to master. Each data node queries all the relevant
> shards that are present locally and then combines those results for
> delivery to master. Do only primary shards get queried, or are replica
> shards queried too?
> Master takes these combined results from all the relevant nodes and
> combines them into the final query response.
>
> Same questions:
> What benefit do I get to searching when I add more nodes that are like
> master?
> node.master: true
> node.data: false
>
> What about if I add "search load balancer" nodes?
> node.master: false
> node.data: false
>
>
> Is the only difference between a
> node.master: true
> node.data: false
> and a
> node.master: false
> node.data: false
> that the node is a candidate to be a master, should it be elected?
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/eaff1d85-1e85-422d-bfba-9a0825ed5da9%40googlegroups.com
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAEM624YebkNK-nJgH63qP2p0pbw4ctUxVoArHYvT0qXDXmPsbQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: how to modify term frequency formula?

2014-03-21 Thread Ivan Brusic

Term frequencies are stored within Lucene, so there is no calculating of
the value, just a lookup in the data structure. You can disable term
frequencies and then create your own in the script, but it would be easier
to calculate that value at index time so that you can access it within your
custom score and not have to iterate through all the terms yourself. Britta
has posted on the mailing list in the past, so hopefully she will reply
with some more authoritative answers, especially ones regarding performance.

-- 
Ivan


On Fri, Mar 21, 2014 at 11:54 AM, geantbrun  wrote:

> Thanks a lot Ivan, great answer.
>
> Suppose I use in my script my own formula for tf (with
> _index[field][term].tf()) and set the boost_mode to "replace", does
> elasticsearch calculate the tf two times or once only? In other words, is
> it computionnally efficient to calculate my own tf? Should I turn off other
> calculations made by es somewhere else to avoid double calculations?
>
> Cheers,
> Patrick
>
> Le jeudi 20 mars 2014 17:44:53 UTC-4, Ivan Brusic a écrit :
>>
>> You can provide your own similarity to be used at the field level, but
>> recent version of elasticsearch allows you to access the tf-idf values in
>> order to do custom scoring [1]. Also look at Britta's recent talk on the
>> subject [2].
>>
>> That said, either your custom similarity or custom scoring would need
>> access to what exactly are the terms which are repeated many times. Have
>> you looked into omitting term frequencies? It would completely bypass using
>> term frequencies, which might be an overkill in your case. Look into the
>> index options [3].
>>
>> Finally, perhaps the common terms query can help [4].
>>
>> [1] http://www.elasticsearch.org/guide/en/elasticsearch/
>> reference/current/modules-advanced-scripting.html
>>
>> [2] https://speakerdeck.com/elasticsearch/scoring-for-human-beings
>>
>> [3] http://www.elasticsearch.org/guide/en/elasticsearch/
>> reference/current/mapping-core-types.html#string
>>
>> [4] http://www.elasticsearch.org/guide/en/elasticsearch/
>> reference/current/query-dsl-common-terms-query.html
>>
>> Cheers,
>>
>> Ivan
>>
>>
>> On Thu, Mar 20, 2014 at 8:08 AM, geantbrun  wrote:
>>
>>> Hi,
>>> If I understand well, the formula used for the term frequency part in
>>> the default similarity module is the square root of the actual frequency.
>>> Is it possible to modify that formula to include something like a
>>> min(my_max_value,sqrt(frequency))? I would like to avoid huge tf's for
>>> documents that have the same term repeated many times. It seems that BM25
>>> similarity has a parameter to control saturation but I would prefer to
>>> stick with the simple tf/idf similarity module.
>>> Thank you for your help
>>> Patrick
>>>
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "elasticsearch" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to elasticsearc...@googlegroups.com.
>>>
>>> To view this discussion on the web visit https://groups.google.com/d/
>>> msgid/elasticsearch/9a12b611-d08d-41f9-8fd4-b74ad75a6a5c%
>>> 40googlegroups.com
>>> .
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/64a9a877-8a97-462b-bbc2-5f2280b14d2f%40googlegroups.com
> .
>
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQCoMY8N2YgWCuzsh9MFnaQUZA6e3dhza%3DFPaB2JzUYV3Q%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: Search JAVA search API not working

Thanks! refresh was the issue. Is there a easy way to find how many
documents are in index using Java API?


On Fri, Mar 21, 2014 at 2:24 PM, Mohit Anchlia wrote:

> oh, I forgot refresh. Let me try that. By default it's 1 sec right?
>
>
> On Fri, Mar 21, 2014 at 2:18 PM, David Pilato  wrote:
>
>> Are you just searching in your code or indexing as well?
>> Could it be caused because you did not refresh before searching?
>>
>> --
>> David ;-)
>> Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs
>>
>>
>> Le 21 mars 2014 à 22:13, Mohit Anchlia  a écrit :
>>
>> The below code doesn't seem to match the document for some reason. Same
>> query when run directly using REST API works. Am I doing something wrong in
>> the code?
>>
>> queryString="fields.field1:value1";
>>
>> searchResponse = client.prepareSearch()
>> .setQuery(QueryBuilders.queryString(queryString)).execute()
>>  .actionGet();
>>
>> Doc that I expect to match
>> '{
>>   "namespace": "ns1",
>>   "id": "3",
>>   "fastId": "f1",
>>   "version": 1393443298027,
>>   "lang": "en",
>>   "opType": "create",
>>   "fields": {
>> "field2": "value2",
>> "field1": "value1"
>>   },
>>   "arrayFields": {}
>> }'
>>
>>  --
>> You received this message because you are subscribed to the Google Groups
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to elasticsearch+unsubscr...@googlegroups.com.
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/elasticsearch/CAOT3TWoSGTpNQpHBUUNpa0qQr0X8rm7pZXoqxYqcb-349xkrmA%40mail.gmail.com
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to elasticsearch+unsubscr...@googlegroups.com.
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/elasticsearch/0772CCDA-1515-4B40-8808-FA701E358327%40pilato.fr
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAOT3TWoev84CMWigQJMkBEaW_KsGpYc%2BWCA9qaHPQxXUt2sr8A%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Sporadic NodeDisconnectedException and NoNodeAvailableException Failures

2014-03-21 Thread Matt Greenfield

Hi,

We have been seeing sporadic NodeDisconnectedException and 
NoNodeAvailableException in our ES cluster (0.90.7).

Our cluster is made up of 2 data nodes. One data node has a single primary 
shard and one data node has a single replica shard. We connect to using the 
Java TransportClient configured with both hosts. 

We're able to connect and index and query 98% of the time. I have played 
around with client.transport.ping_timeout and that seems to address our 
NoNodeAvailableException.

However, we haven't been able to figure out the NodeDisconnectedException.

2014-03-21 12:53:18,322 DEBUG [ I/O worker #9}] [APP] 
> [.elasticsearch.transport.netty] [Pisces] *disconnected* from 
> [[#transport#-1][inet[prod-elasticsearch.domain/127.0.0.1:9300]]], *channel 
> closed event*
> 2014-03-21 12:53:22,402 DEBUG [[generic][T#57]] [APP] 
> [.elasticsearch.transport.netty] [Pisces] *connected* to node 
> [[#transport#-1][inet[prod-elasticsearch.domain/127.0.0.1:9300]]]


which is then immediately followed by:

Caused by: org.elasticsearch.transport.*NodeDisconnectedException*: 
> [][inet[prod-elasticsearch.domain/127.0.0.1:9300]][index] *disconnected*


These logs are all generated on the *client side* and there is nothing that 
sticks out in the logs on either of the nodes.

I've seen in other posts that there might be network issues or that there 
might not be enough resources (cpu and/or memory). 

Does anyone have experience with these errors or know where I might want to 
be looking?

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/4494bbe9-e3f8-4069-a093-daa103e8f980%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Please explain the flow of data?

2014-03-21 Thread Josh Harrison

I'm trying to build a basic understanding of how indexing and searching
works, hopefully someone can either point me to good resources or explain!
I'm trying to figure out what having multiple "coordinator" nodes as
defined in the elasticsearch.yml would do, and what having multiple "search
load balancer" nodes would do. Both in the context of indexing and
searching.
Is there a functional difference between a "coordinator" node and a "search
load balancer" node, beyond the fact that a "search load balancer" node
can't be elected master?

Say I have a 4 node cluster. There's a master only "coordinator" node, that
doesn't store data, named "master".
node.master: true
node.data: false

There are three data only nodes, "A", "B" and "C"
node.master: false
node.date: true

I have an index "test" with two shards and one replica. Primary shard 0
lives on A, primary shard 1 lives on C, replica shard 0 lives on B, replica
shard 1 lives on A.

I send the command
curl -XPOST http://master:9200/test/test -d '{"foo":"bar"}'

A connection is made to master, and the data is sent to master to be
indexed. Master randomly decides to place this document in shard 1, so it
gets sent to the primary shard 1 on C and replica shard 1 on B, right? This
is where routing can come in, I can say that that document really should go
to shard 0 because I said so.

So this is a fairly simple scenario, assuming I'm correct.

What benefit do I get to indexing when I add more "coordinator" nodes?
node.master: true
node.data: false

What about if I add "search load balancer" nodes?
node.master: false
node.data: false

How about on the searching side of things?
I send a search to master,
curl -XPOST http://master:9200/test/test/_search -d
'{"query":{"match_all":{}}}'

Master sends these queries off to A, B and C, who each generate their own
results and return them to master. Each data node queries all the relevant
shards that are present locally and then combines those results for
delivery to master. Do only primary shards get queried, or are replica
shards queried too?
Master takes these combined results from all the relevant nodes and
combines them into the final query response.

Same questions:
What benefit do I get to searching when I add more nodes that are like
master?
node.master: true
node.data: false

What about if I add "search load balancer" nodes?
node.master: false
node.data: false

Is the only difference between a
node.master: true
node.data: false
and a
node.master: false
node.data: false
that the node is a candidate to be a master, should it be elected?

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/eaff1d85-1e85-422d-bfba-9a0825ed5da9%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Search JAVA search API not working

oh, I forgot refresh. Let me try that. By default it's 1 sec right?


On Fri, Mar 21, 2014 at 2:18 PM, David Pilato  wrote:

> Are you just searching in your code or indexing as well?
> Could it be caused because you did not refresh before searching?
>
> --
> David ;-)
> Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs
>
>
> Le 21 mars 2014 à 22:13, Mohit Anchlia  a écrit :
>
> The below code doesn't seem to match the document for some reason. Same
> query when run directly using REST API works. Am I doing something wrong in
> the code?
>
> queryString="fields.field1:value1";
>
> searchResponse = client.prepareSearch()
> .setQuery(QueryBuilders.queryString(queryString)).execute()
>  .actionGet();
>
> Doc that I expect to match
> '{
>   "namespace": "ns1",
>   "id": "3",
>   "fastId": "f1",
>   "version": 1393443298027,
>   "lang": "en",
>   "opType": "create",
>   "fields": {
> "field2": "value2",
> "field1": "value1"
>   },
>   "arrayFields": {}
> }'
>
>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/CAOT3TWoSGTpNQpHBUUNpa0qQr0X8rm7pZXoqxYqcb-349xkrmA%40mail.gmail.com
> .
> For more options, visit https://groups.google.com/d/optout.
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/0772CCDA-1515-4B40-8808-FA701E358327%40pilato.fr
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAOT3TWoEuUrsy1j-i_yTcQxoRyCR-nsYyDTQwvw_QT-6dsYTnQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Node Exception with Transport Client

2014-03-21 Thread IronMan2014

I posted earlier with an issue that I spent quite a bit of time trying to 
figure out how to setup EC2 cluster, I followed documentation, but still 
couldn't get it to work, I have 2 "i.4xlarge instances"

So, I decided to sale down the problem and work with one instance: 
I installed the latest 1.0.0 on EC2 instance and "aws plugin 2.0", the 
instance works fine from http API but not from my Java/API transport client.

Note, the transport client works fine against my local ES server (0.90.9), 
so nothing wrong with the Java client.

//Settings

Settings settings = ImmutableSettings.settingsBuilder()

.put("client.transport.sniff", true)

  .put("refresh_interval", "-1")  //Needs to be updated after indexing is 
done

  .put("number_of_shards", 1)

  .put("number_of_replicas", "0")

  .put("index.merge.policy.merge_factor", 30)

  .put("client.transport.ping_timeout", "10s")  //Node conn timeout




But when I use My JavaAPI/Transport Client against Version 1.0.0 on the 
instance, I get this error:

WARN - Log4jESLogger.internalWarn(124) | [The Stepford Cuckoos] Message not 
fully read (response) for [0] handler 
future(org.elasticsearch.client.transport.TransportClientNodesService$SimpleNodeSampler$1@2d35da43),
 
error [true], resettingException in thread "main" 
org.elasticsearch.client.transport.NoNodeAvailableException: No node 
available



-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/d0ecf9ca-619d-4127-b534-01147d829a79%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Search JAVA search API not working

Are you just searching in your code or indexing as well?
Could it be caused because you did not refresh before searching?

--
David ;-)
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 21 mars 2014 à 22:13, Mohit Anchlia a écrit :

The below code doesn't seem to match the document for some reason. Same query
when run directly using REST API works. Am I doing something wrong in the code?

queryString="fields.field1:value1";

searchResponse = client.prepareSearch()

.setQuery(QueryBuilders.queryString(queryString)).execute()
.actionGet();

Doc that I expect to match
'{
"namespace": "ns1",
"id": "3",
"fastId": "f1",
"version": 1393443298027,
"lang": "en",
"opType": "create",
"fields": {
"field2": "value2",
"field1": "value1"
},
"arrayFields": {}
}'

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAOT3TWoSGTpNQpHBUUNpa0qQr0X8rm7pZXoqxYqcb-349xkrmA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/0772CCDA-1515-4B40-8808-FA701E358327%40pilato.fr.
For more options, visit https://groups.google.com/d/optout.

Search JAVA search API not working

The below code doesn't seem to match the document for some reason. Same
query when run directly using REST API works. Am I doing something wrong in
the code?

queryString="fields.field1:value1";

searchResponse = client.prepareSearch()
.setQuery(QueryBuilders.queryString(queryString)).execute()
.actionGet();

Doc that I expect to match
'{
  "namespace": "ns1",
  "id": "3",
  "fastId": "f1",
  "version": 1393443298027,
  "lang": "en",
  "opType": "create",
  "fields": {
"field2": "value2",
"field1": "value1"
  },
  "arrayFields": {}
}'

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAOT3TWoSGTpNQpHBUUNpa0qQr0X8rm7pZXoqxYqcb-349xkrmA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: Simple query string search

Please ignore I inadvertently updated one of the documents.


On Fri, Mar 21, 2014 at 1:29 PM, Mohit Anchlia wrote:

> I am trying to run a simple search and for some reason it's not working. I
> might just be making a simple mistake here. I indexed 2 documents:
>
> [ec2-user@ip-10-80-140-13 ~]$ curl -XPUT '
> http://localhost:9200/users/payments/2' -d '{
> >   "namespace": "ns1",
> >   "id": "1",
> >   "fastId": "f1",
> >   "version": 1393443298027,
> >   "lang": "en",
> >   "opType": "create",
> >   "fields": {
> > "field2": "value2",
> > "field1": "value11"
> >   },
> >   "arrayFields": {}
> > }'
>
> {"_index":"users","_type":"payments","_id":"2","_version":1,"created":true}[
>
> [ec2-user@ip-10-80-140-13 ~]$ curl -XPUT '
> http://localhost:9200/users/payments/1' -d '{
> >   "namespace": "ns1",
> >   "id": "2",
> >   "fastId": "f1",
> >   "version": 1393443298027,
> >   "lang": "en",
> >   "opType": "create",
> >   "fields": {
> > "field2": "value2",
> > "field1": "value1"
> >   },
> >   "arrayFields": {}
> > }'
> {"_index":"users","_type":"payments","_id":"1","_version":1,"created":true}
>
> And now I am trying to run a simple search and it doesn't seem to return
> any docs. I expected 2 docs to match:
>
> [ec2-user@ip-10-80-140-13 ~]$ curl -XGET '
> http://localhost:9200/users/payments/_search?q=field.field1:value1'
>
> {"took":1,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":0,"max_score":null,"hits":[]}}
>
> [ec2-user@ip-10-80-140-13 ~]$ curl -XGET '
> http://localhost:9200/users/payments/_search?q=value1'
>
> {"took":1,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":0,"max_score":null,"hits":[]}}
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAOT3TWr%3DfW5-w0W%3DRMNjjgE%3DpmuCrxBV8XAQiWXQ8r-APspkbw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: Getting [marvel.agent.exporter ] [Tempest] remote target didn't respond with 200 OK response code [404 Not Found

2014-03-21 Thread Dennis Andersen

Yes, that was it! Thanks!

Dennis

On Friday, March 21, 2014 11:29:13 AM UTC-7, Boaz Leskes wrote:
>
> Hi Dennis,
>
> Do you have action.auto_create_index: false in your elasticsearch.yml? if 
> so  see 
> http://www.elasticsearch.org/guide/en/marvel/current/#relevant-settings
>
> Cheers,
> Boaz
>
> On Thursday, March 20, 2014 1:35:05 AM UTC+1, Dennis Andersen wrote:
>>
>> My two clusters, currently default named, Tempest and Morg, appear to be 
>> working fine. The head plugin shows green and queries work. I have indices 
>> and documents added.
>>
>> When I added Marvel, restarted both nodes, I see the error
>>  [marvel.agent.exporter] [Tempest] remote target didn't respond with 
>> 200 OK response code [404 Not Found
>> repeated in each cluster's log files (with the names changed accordingly).
>>
>> Marvel has a blank screen and reports "No results There were no results 
>> because no indices were found that match your selected time span".
>>
>> I am running version 1.0.1 on each server, have Java (build 1.7.0_25-b15) 
>> on each. No configuration changes were made -- meaning no marvel.agent 
>> properties were set in the configuration.
>> I do have 
>> discovery.zen.ping.multicast.enabled: false
>>
>> Any suggestions?
>>
>> thanks,
>>
>> Dennis
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/0b2ed392-ea0b-4641-ac41-413d5a67b188%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Increasing relevance with additional matching terms matching, but with constant scores

2014-03-21 Thread Jorge T

Hi all,

I am working on an autocomplete implementation, and am trying to do the
following:

Given a user query: minneapolis hotel ivy

I want to be able to query one or more fields, and boost the relevancy for
each term that matches. Many are already jumping up and saying "the match
query already does that!", but the catch is, we rely heavily on popularity
for sorting, so 3 terms matching should always give the same base level of
relevance. Using match, the term frequencies come into play, and the score
for "minneapolis hotel ivy" will be different from "boston hotel ivy", with
3 terms matching in both, which is not what we want.

A simple solution is to tokenize these terms in the calling code and use a
bool query with match clauses and constant score. For example:

{
"bool": {
"should": [
{ "constant_score": {"query": "term": { "name": "minneapolis"}},
"boost" : 1},
{ "constant_score": {"query": "term": { "name": "hotel"}}, "boost" :
1},
{ "constant_score": {"query": "term": { "name": "ivy"}}, "boost" :
1},
]
}
}

This is fine for English and most western langauges, but becomes more
complicated and undesirable in other languages, primarily Chinese,
Japanese, Korean, etc. I want to be able to use custom tokenizers there,
and just leave the dirty work to ES.

I'm fairly certain there isn't an easy solution here, but just wanted to
double check. Thanks in advance!

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/32fe114c-0f76-49cb-8e8b-548c8bb450eb%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Simple query string search

I am trying to run a simple search and for some reason it's not working. I
might just be making a simple mistake here. I indexed 2 documents:

[ec2-user@ip-10-80-140-13 ~]$ curl -XPUT '
http://localhost:9200/users/payments/2' -d '{
>   "namespace": "ns1",
>   "id": "1",
>   "fastId": "f1",
>   "version": 1393443298027,
>   "lang": "en",
>   "opType": "create",
>   "fields": {
> "field2": "value2",
> "field1": "value11"
>   },
>   "arrayFields": {}
> }'
{"_index":"users","_type":"payments","_id":"2","_version":1,"created":true}[

[ec2-user@ip-10-80-140-13 ~]$ curl -XPUT '
http://localhost:9200/users/payments/1' -d '{
>   "namespace": "ns1",
>   "id": "2",
>   "fastId": "f1",
>   "version": 1393443298027,
>   "lang": "en",
>   "opType": "create",
>   "fields": {
> "field2": "value2",
> "field1": "value1"
>   },
>   "arrayFields": {}
> }'
{"_index":"users","_type":"payments","_id":"1","_version":1,"created":true}

And now I am trying to run a simple search and it doesn't seem to return
any docs. I expected 2 docs to match:

[ec2-user@ip-10-80-140-13 ~]$ curl -XGET '
http://localhost:9200/users/payments/_search?q=field.field1:value1'
{"took":1,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":0,"max_score":null,"hits":[]}}

[ec2-user@ip-10-80-140-13 ~]$ curl -XGET '
http://localhost:9200/users/payments/_search?q=value1'
{"took":1,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":0,"max_score":null,"hits":[]}}

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAOT3TWoLRfwPr7tPtv9LuN%2B3jcbTuWKp8ioBY-YsZnYgcXoHZA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Elasticsearch for Ruby on Rails: An Introduction to Chewy

2014-03-21 Thread H Singer



Building on the foundation of Elasticsearch and the Elasticsearch-Ruby 
client, Chewy is a Ruby gem that extends (and simplifies)  the 
Elasticsearch application search architecture, while also provides tighter 
integration with Rails.

This post provided an introduction to Chewy (with code samples), including 
discussion of the technical obstacles that emerged during implementation:

http://www.toptal.com/ruby-on-rails/elasticsearch-for-ruby-on-rails-an-introduction-to-chewy


-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/cbfcce56-bdba-46e6-af75-86436c2e46a0%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

nested aggregations in kibana

2014-03-21 Thread Jon Stearley

Hello.  I'm trying to make a 2D contingency table in Kibana (eg domain by 
_type).  The below in Chrome/Sense returns reasonable results, but how do I 
get this displayed in kibana?  I'm trying to use a table panel but can't 
figure out where to put the query - maybe a different panel or future 
aggregations panel is needed?  Thanks.

-Jon Stearley


HERE IS THE QUERY:

GET /_all/_search
{
  "aggregations": {
"domains": {
  "terms": {
"field": "DOMAIN"
  },
  "aggregations": {
  "bytype": {
  "terms":{
  "field":"_type"
  }
  }
  }
}
  },
  "size": 0
}





HERE IS AN EXAMPLE RESULT (actual domains and types anonymized):



{
   "took": 252,
   "timed_out": false,
   "_shards": {
  "total": 6,
  "successful": 6,
  "failed": 0
   },
   "hits": {
  "total": 967233,
  "max_score": 0,
  "hits": []
   },
   "aggregations": {
  "domains": {
 "buckets": [
{
   "key": "bleh.blech",
   "doc_count": 3508,
   "bytype": {
  "buckets": [
 {
"key": "type",
"doc_count": 3506
 },
 {
"key": "typeb",
"doc_count": 2
 }
  ]
   }
},
{
   "key": "foo.gah",
   "doc_count": 3470,
   "bytype": {
  "buckets": [
 {
"key": "typea",
"doc_count": 3400
 },
 {
"key": "typeb",
"doc_count": 70
 }
  ]
   }
}
 ]
  }
   }
}

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/565187bd-5abe-433b-be07-44cd107bd607%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Java Serialization of Exceptions

2014-03-21 Thread Chris Berry

We've built our own Elasticsearch Client that has niceties like;  OAuth, the 
ability to swap Clusters for maintenance, Zookeeper integration, ease of 
configuration, retries, etc.
Otherwise we'd have a lot of "wheel reinventing" going on.
Plus, the Java Client is pretty nice, after all. ;-)

Thanks for all the input.
(And kimchy, thanks for the brilliant product)

Cheers,
-- Chris 
 
On Mar 21, 2014, at 2:47 PM, Matt Weber  wrote:

> If this is a concern, why not have your client's use the REST api so they 
> don't need to worry about matching their java version with the java version 
> of the search cluster?
> 
> Thanks,
> Matt Weber
> 
> 
> 
> On Fri, Mar 21, 2014 at 12:07 PM, kimchy  wrote:
> Not trivializing the bug at all, god knows I spend close to a week tracing it 
> down to a JVM backward incompatibility change, but this happened once over 
> the almost 5 years Elasticsearch existed. To introduce a workaround to 
> something that happened once, compared to potential bugs in the workaround 
> (Jackson is great, but what would happen if there was a bug in it for 
> example) is not a great solution. Obviously, if this happened more often, 
> then this is something we need to address.
> 
> On Friday, March 21, 2014 7:12:02 PM UTC+1, Chris Berry wrote:
> If it happened once, then by definition it will happen again. History repeats 
> itself. ;-)
> 
> What exactly would you lose?
> You are simply trading one rigid serialization scheme for another more 
> lenient one.
> Yes, you would have to introduce something like Jackson's Object Mapper, but 
> that seems to be the defacto standard today and with your use of the Shade 
> Plugin it wouldn't really be a burden on the Client anyway.
> 
> With all due respect, you may be trivializing the impact of this one time bug.
> It is difficult, at best, to inform all the Clients of your Cluster; "Hey, if 
> you want to see what your Exceptions really are, then upgrade your JVM" 
> Especially in large SOA shops
> 
> This just decouples the Client and Server deployments.
> 
> Thanks much,
> -- Chris 
> 
> On Mar 21, 2014, at 12:18 PM, kimchy  wrote:
> 
>> I wonder why you are asking for this feature? If its because Java broke 
>> backward comp. on serialization of InetAddress that we use in our 
>> exceptions, then its a bug in Java serialization, hard for us to do 
>> something about it. 
>> 
>> You will loose a lot by trying to serialize exceptions using JSON, and we 
>> prefer not to introduce dependency on ObjectMapper in Jackson, or try and 
>> serialize exceptions using Jackson.
>> 
>> I would be very careful in introducing this just because of a (one time bug) 
>> in Java.
>> 
>> On Friday, March 21, 2014 5:18:38 PM UTC+1, Chris Berry wrote:
>> Greetings,
>> 
>> Let me say up-front, I am a huge fan and proponent of Elasticsearch. It is a 
>> beautiful tool.
>> 
>> So, that said, it surprises me that Elasticsearch has such a pedestrian 
>> flaw, and serializes it's Exceptions using Java Serialization.
>> In a big shop it is quite difficult (i.e. next to impossible) to keep all 
>> the ES Clients on the same exact JVM as Elasticsearch, and thus, it is not 
>> uncommon to get TransportSerializationExceptions instead of the actual 
>> underlying problem.
>> I was really hoping this would be corrected in ES 1.0.X, but no such luck. 
>> (As far as I can tell...)
>> 
>> It seems that this is pretty easily fixed?
>> Just switch to a JSON representation of the basic Exception and gracefully 
>> (forwards-compatibly) attempt to re-hydrate the actual Exception class. 
>> You'd just have to drop an additional "header" in the stream that tells the 
>> code it is a JSON response and route to the right Handler it accordingly. If 
>> the header is missing, then do things the old way with Java Serialization??
>> 
>> Are there any plans to fix this? Or perhaps to entertain a Pull Request?
>> It may seem just an annoyance, but it is actually pretty bad, in that it 
>> keeps Clients from seeing their real issues. Especially in shops where it is 
>> difficult to see the Production logs of Elasticsearch itself. 
>> 
>> Thanks much,
>> -- Chris 
>> 
>> 
>> 
>> -- 
>> You received this message because you are subscribed to a topic in the 
>> Google Groups "elasticsearch" group.
>> To unsubscribe from this topic, visit 
>> https://groups.google.com/d/topic/elasticsearch/7bpam7mWjY8/unsubscribe.
>> To unsubscribe from this group and all its topics, send an email to 
>> elasticsearc...@googlegroups.com.
>> 
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/6ae5f173-a2b4-435c-8e5d-a43d377e2fb0%40googlegroups.com.
>> For more options, visit https://groups.google.com/d/optout.
> 
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussio

Re: Java Serialization of Exceptions

What I'm interested in would be a perspective that ES nodes could
communicate with other ES nodes by transparent (readable) data streams
specified by an "ES node protocol", independent of Java serialization. So,
ES nodes in the long run could be implemented in even another language on
the JVM that may not be able to handle the internals of Java serialization
stream protocol. I know this could be implemented by a plugin. But a
standard, transparent node communication protocol in the core would even be
better.

Looking back in Java history there were two visible breaking serialization
changes. First was the jump from 1.1 to 1.2, when 1.1 JVMs couldn't read
objects from 1.2 JVMs over the network any more. And second was for the 1.5
JVMs when enums were added to the serialization. This broke the
communication with 1.4 JVMs. That was all before ES, now we don't have to
care about any JVM <6. And recently there was the invisible, hard to reveal
breakage, the InetAddress class, due to to an undisclosed "security fix",
and yes, it's a bug that can not be fixed from within ES.

Let's cross fingers the next serialization break will not take place so
soon. Maybe when Project Jigsaw has been addressed in Java 9 and ES must
move to Java 9. That feels like a very long way ahead. So I'm fine with
waiting for when the time is right.

Jörg



On Fri, Mar 21, 2014 at 8:07 PM, kimchy  wrote:

> Not trivializing the bug at all, god knows I spend close to a week tracing
> it down to a JVM backward incompatibility change, but this happened once
> over the almost 5 years Elasticsearch existed. To introduce a workaround to
> something that happened once, compared to potential bugs in the workaround
> (Jackson is great, but what would happen if there was a bug in it for
> example) is not a great solution. Obviously, if this happened more often,
> then this is something we need to address.
>
> On Friday, March 21, 2014 7:12:02 PM UTC+1, Chris Berry wrote:
>
>> If it happened once, then by definition it will happen again. History
>> repeats itself. ;-)
>>
>> What exactly would you lose?
>> You are simply trading one rigid serialization scheme for another more
>> lenient one.
>> Yes, you would have to introduce something like Jackson's Object Mapper,
>> but that seems to be the defacto standard today and with your use of the
>> Shade Plugin it wouldn't really be a burden on the Client anyway.
>>
>> With all due respect, you may be trivializing the impact of this one time
>> bug.
>> It is difficult, at best, to inform all the Clients of your Cluster;
>> "Hey, if you want to see what your Exceptions really are, then upgrade your
>> JVM"
>> Especially in large SOA shops
>>
>> This just decouples the Client and Server deployments.
>>
>> Thanks much,
>> -- Chris
>>
>> On Mar 21, 2014, at 12:18 PM, kimchy  wrote:
>>
>> I wonder why you are asking for this feature? If its because Java broke
>> backward comp. on serialization of InetAddress that we use in our
>> exceptions, then its a bug in Java serialization, hard for us to do
>> something about it.
>>
>> You will loose a lot by trying to serialize exceptions using JSON, and we
>> prefer not to introduce dependency on ObjectMapper in Jackson, or try and
>> serialize exceptions using Jackson.
>>
>> I would be very careful in introducing this just because of a (one time
>> bug) in Java.
>>
>> On Friday, March 21, 2014 5:18:38 PM UTC+1, Chris Berry wrote:
>>>
>>> Greetings,
>>>
>>> Let me say up-front, I am a huge fan and proponent of Elasticsearch. It
>>> is a beautiful tool.
>>>
>>> So, that said, it surprises me that Elasticsearch has such a pedestrian
>>> flaw, and serializes it's Exceptions using Java Serialization.
>>> In a big shop it is quite difficult (i.e. next to impossible) to keep
>>> all the ES Clients on the same exact JVM as Elasticsearch, and thus, it is
>>> not uncommon to get TransportSerializationExceptions instead of the
>>> actual underlying problem.
>>> I was really hoping this would be corrected in ES 1.0.X, but no such
>>> luck. (As far as I can tell...)
>>>
>>> It seems that this is pretty easily fixed?
>>> Just switch to a JSON representation of the basic Exception and
>>> gracefully (forwards-compatibly) attempt to re-hydrate the actual Exception
>>> class.
>>> You'd just have to drop an additional "header" in the stream that tells
>>> the code it is a JSON response and route to the right Handler it
>>> accordingly. If the header is missing, then do things the old way with Java
>>> Serialization??
>>>
>>> Are there any plans to fix this? Or perhaps to entertain a Pull Request?
>>> It may seem just an annoyance, but it is actually pretty bad, in that it
>>> keeps Clients from seeing their real issues. Especially in shops where it
>>> is difficult to see the Production logs of Elasticsearch itself.
>>>
>>> Thanks much,
>>> -- Chris
>>>
>>>
>>>
>> --
>> You received this message because you are subscribed to a topic in the
>> Google Groups "elasticsearch" group.
>> To

Re: Java Serialization of Exceptions

2014-03-21 Thread Matt Weber

If this is a concern, why not have your client's use the REST api so they
don't need to worry about matching their java version with the java version
of the search cluster?

Thanks,
Matt Weber



On Fri, Mar 21, 2014 at 12:07 PM, kimchy  wrote:

> Not trivializing the bug at all, god knows I spend close to a week tracing
> it down to a JVM backward incompatibility change, but this happened once
> over the almost 5 years Elasticsearch existed. To introduce a workaround to
> something that happened once, compared to potential bugs in the workaround
> (Jackson is great, but what would happen if there was a bug in it for
> example) is not a great solution. Obviously, if this happened more often,
> then this is something we need to address.
>
> On Friday, March 21, 2014 7:12:02 PM UTC+1, Chris Berry wrote:
>
>> If it happened once, then by definition it will happen again. History
>> repeats itself. ;-)
>>
>> What exactly would you lose?
>> You are simply trading one rigid serialization scheme for another more
>> lenient one.
>> Yes, you would have to introduce something like Jackson’s Object Mapper,
>> but that seems to be the defacto standard today and with your use of the
>> Shade Plugin it wouldn’t really be a burden on the Client anyway.
>>
>> With all due respect, you may be trivializing the impact of this one time
>> bug.
>> It is difficult, at best, to inform all the Clients of your Cluster;
>> “Hey, if you want to see what your Exceptions really are, then upgrade your
>> JVM”
>> Especially in large SOA shops
>>
>> This just decouples the Client and Server deployments.
>>
>> Thanks much,
>> — Chris
>>
>> On Mar 21, 2014, at 12:18 PM, kimchy  wrote:
>>
>> I wonder why you are asking for this feature? If its because Java broke
>> backward comp. on serialization of InetAddress that we use in our
>> exceptions, then its a bug in Java serialization, hard for us to do
>> something about it.
>>
>> You will loose a lot by trying to serialize exceptions using JSON, and we
>> prefer not to introduce dependency on ObjectMapper in Jackson, or try and
>> serialize exceptions using Jackson.
>>
>> I would be very careful in introducing this just because of a (one time
>> bug) in Java.
>>
>> On Friday, March 21, 2014 5:18:38 PM UTC+1, Chris Berry wrote:
>>>
>>> Greetings,
>>>
>>> Let me say up-front, I am a huge fan and proponent of Elasticsearch. It
>>> is a beautiful tool.
>>>
>>> So, that said, it surprises me that Elasticsearch has such a pedestrian
>>> flaw, and serializes it's Exceptions using Java Serialization.
>>> In a big shop it is quite difficult (i.e. next to impossible) to keep
>>> all the ES Clients on the same exact JVM as Elasticsearch, and thus, it is
>>> not uncommon to get TransportSerializationExceptions instead of the
>>> actual underlying problem.
>>> I was really hoping this would be corrected in ES 1.0.X, but no such
>>> luck. (As far as I can tell...)
>>>
>>> It seems that this is pretty easily fixed?
>>> Just switch to a JSON representation of the basic Exception and
>>> gracefully (forwards-compatibly) attempt to re-hydrate the actual Exception
>>> class.
>>> You'd just have to drop an additional "header" in the stream that tells
>>> the code it is a JSON response and route to the right Handler it
>>> accordingly. If the header is missing, then do things the old way with Java
>>> Serialization??
>>>
>>> Are there any plans to fix this? Or perhaps to entertain a Pull Request?
>>> It may seem just an annoyance, but it is actually pretty bad, in that it
>>> keeps Clients from seeing their real issues. Especially in shops where it
>>> is difficult to see the Production logs of Elasticsearch itself.
>>>
>>> Thanks much,
>>> -- Chris
>>>
>>>
>>>
>> --
>> You received this message because you are subscribed to a topic in the
>> Google Groups "elasticsearch" group.
>> To unsubscribe from this topic, visit https://groups.google.com/d/
>> topic/elasticsearch/7bpam7mWjY8/unsubscribe.
>> To unsubscribe from this group and all its topics, send an email to
>> elasticsearc...@googlegroups.com.
>>
>> To view this discussion on the web visit https://groups.google.com/d/
>> msgid/elasticsearch/6ae5f173-a2b4-435c-8e5d-a43d377e2fb0%
>> 40googlegroups.com
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>>
>>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/891337f7-230f-4ce2-a2b4-57749f095748%40googlegroups.com
> .
>
> For more options, visi

Replica shard stuck at initializing after client and data node restart

2014-03-21 Thread Glenn Snead

I have a six node cluster: 2 master nodes and 4 client / data nodes. I
have two indicies. One with data and one that is set aside for future
use. I'm having trouble with the indicie that is in use.
After making some limits.conf configuraiton changes and restarting the
impacted nodes, one of my indicies' replica shard will not complete
initialization.
I wasn't in charge of the node restarts and here is the sequence of events:
Shut down the client and data nodes on each of the four servers.
Start the client and data node on each server.
I don't believe time was allowed to allow the cluster to reallocate or to
move shards.

limits.conf changes:
- memlock unlimited
hard nofiles 32000
soft nofiles 32000

Here's what I have tried thus far:

Drop the replica shard, which brings the cluster status to Green.
Verify the cluster's status - no replication, no realocating, etc.
Re-add the replica shard.

Drop the replica shard and the data nodes that were to carry the replica
shard.
Verify the cluster's status.
Start the data nodes and allow the cluster to reallocate primary shards.
- The cluster's status is Green.
Add the replica shard to the indicie. The replica shard never completes
initialization, even over a 24 hour period.

I've checked the transaction log files on each node and they are all zero
legnth files.
The replica shard holding nodes are primary shards for the unused indicie.
These nodes copied it's matching primary node's index Size (as seen in
paramedic), but now Paramedic shows an index Size of only a few bytes. The
index folder on the replica shard servers still has the data.

Unknown to me, my target system was put online and my leadership doesn't
want to schedule an outage window. Most my reasearch suggests that I drop
the impacted indicie and re-initialize. I can replace the data, but this
would impact the user interface while the indicie re-ingests the
documents. This issue has occured before on my test system and the fix was
to rebuild the index. However I never learned why the replica shard had
the issue in the first place.

My questions are:
- Does the replica shard hosting server's index Size (shown in paramedic)
indciate a course of action?
- Is it possible to resolve this without dropping the indicie and
rebuilding? I'd hate to resort to this each time we attempt ES server
maintenance or configuration changes.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/a868b3da-fd28-49b4-bc8f-2f60f2c34ec7%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Java Serialization of Exceptions

2014-03-21 Thread kimchy

Not trivializing the bug at all, god knows I spend close to a week tracing 
it down to a JVM backward incompatibility change, but this happened once 
over the almost 5 years Elasticsearch existed. To introduce a workaround to 
something that happened once, compared to potential bugs in the workaround 
(Jackson is great, but what would happen if there was a bug in it for 
example) is not a great solution. Obviously, if this happened more often, 
then this is something we need to address.

On Friday, March 21, 2014 7:12:02 PM UTC+1, Chris Berry wrote:
>
> If it happened once, then by definition it will happen again. History 
> repeats itself. ;-)
>
> What exactly would you lose?
> You are simply trading one rigid serialization scheme for another more 
> lenient one.
> Yes, you would have to introduce something like Jackson’s Object Mapper, 
> but that seems to be the defacto standard today and with your use of the 
> Shade Plugin it wouldn’t really be a burden on the Client anyway.
>
> With all due respect, you may be trivializing the impact of this one time 
> bug.
> It is difficult, at best, to inform all the Clients of your Cluster; “Hey, 
> if you want to see what your Exceptions really are, then upgrade your JVM” 
> Especially in large SOA shops
>
> This just decouples the Client and Server deployments.
>
> Thanks much,
> — Chris 
>
> On Mar 21, 2014, at 12:18 PM, kimchy > 
> wrote:
>
> I wonder why you are asking for this feature? If its because Java broke 
> backward comp. on serialization of InetAddress that we use in our 
> exceptions, then its a bug in Java serialization, hard for us to do 
> something about it. 
>
> You will loose a lot by trying to serialize exceptions using JSON, and we 
> prefer not to introduce dependency on ObjectMapper in Jackson, or try and 
> serialize exceptions using Jackson.
>
> I would be very careful in introducing this just because of a (one time 
> bug) in Java.
>
> On Friday, March 21, 2014 5:18:38 PM UTC+1, Chris Berry wrote:
>>
>> Greetings,
>>
>> Let me say up-front, I am a huge fan and proponent of Elasticsearch. It 
>> is a beautiful tool.
>>
>> So, that said, it surprises me that Elasticsearch has such a pedestrian 
>> flaw, and serializes it's Exceptions using Java Serialization.
>> In a big shop it is quite difficult (i.e. next to impossible) to keep all 
>> the ES Clients on the same exact JVM as Elasticsearch, and thus, it is not 
>> uncommon to get TransportSerializationExceptions instead of the actual 
>> underlying problem.
>> I was really hoping this would be corrected in ES 1.0.X, but no such 
>> luck. (As far as I can tell...)
>>
>> It seems that this is pretty easily fixed?
>> Just switch to a JSON representation of the basic Exception and 
>> gracefully (forwards-compatibly) attempt to re-hydrate the actual Exception 
>> class. 
>> You'd just have to drop an additional "header" in the stream that tells 
>> the code it is a JSON response and route to the right Handler it 
>> accordingly. If the header is missing, then do things the old way with Java 
>> Serialization??
>>
>> Are there any plans to fix this? Or perhaps to entertain a Pull Request?
>> It may seem just an annoyance, but it is actually pretty bad, in that it 
>> keeps Clients from seeing their real issues. Especially in shops where it 
>> is difficult to see the Production logs of Elasticsearch itself. 
>>
>> Thanks much,
>> -- Chris 
>>
>>
>>
> -- 
> You received this message because you are subscribed to a topic in the 
> Google Groups "elasticsearch" group.
> To unsubscribe from this topic, visit 
> https://groups.google.com/d/topic/elasticsearch/7bpam7mWjY8/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to 
> elasticsearc...@googlegroups.com .
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/elasticsearch/6ae5f173-a2b4-435c-8e5d-a43d377e2fb0%40googlegroups.com
> .
> For more options, visit https://groups.google.com/d/optout.
>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/891337f7-230f-4ce2-a2b4-57749f095748%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: how to modify term frequency formula?

2014-03-21 Thread geantbrun

Thanks a lot Ivan, great answer. 

Suppose I use in my script my own formula for tf (with 
_index[field][term].tf()) and set the boost_mode to "replace", does 
elasticsearch calculate the tf two times or once only? In other words, is 
it computionnally efficient to calculate my own tf? Should I turn off other 
calculations made by es somewhere else to avoid double calculations?

Cheers,
Patrick

Le jeudi 20 mars 2014 17:44:53 UTC-4, Ivan Brusic a écrit :
>
> You can provide your own similarity to be used at the field level, but 
> recent version of elasticsearch allows you to access the tf-idf values in 
> order to do custom scoring [1]. Also look at Britta's recent talk on the 
> subject [2].
>
> That said, either your custom similarity or custom scoring would need 
> access to what exactly are the terms which are repeated many times. Have 
> you looked into omitting term frequencies? It would completely bypass using 
> term frequencies, which might be an overkill in your case. Look into the 
> index options [3].
>
> Finally, perhaps the common terms query can help [4].
>
> [1] 
> http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-advanced-scripting.html
>
> [2] https://speakerdeck.com/elasticsearch/scoring-for-human-beings
>
> [3] 
> http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/mapping-core-types.html#string
>
> [4] 
> http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-common-terms-query.html
>
> Cheers,
>
> Ivan
>
>
> On Thu, Mar 20, 2014 at 8:08 AM, geantbrun 
> > wrote:
>
>> Hi,
>> If I understand well, the formula used for the term frequency part in the 
>> default similarity module is the square root of the actual frequency. Is it 
>> possible to modify that formula to include something like a 
>> min(my_max_value,sqrt(frequency))? I would like to avoid huge tf's for 
>> documents that have the same term repeated many times. It seems that BM25 
>> similarity has a parameter to control saturation but I would prefer to 
>> stick with the simple tf/idf similarity module.
>> Thank you for your help
>> Patrick
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to elasticsearc...@googlegroups.com .
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/9a12b611-d08d-41f9-8fd4-b74ad75a6a5c%40googlegroups.com
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/64a9a877-8a97-462b-bbc2-5f2280b14d2f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: ES attempting to parse dates automatically (possibly Hadoop related)

2014-03-21 Thread Costin Leau


You have a type - should be "default" not "defualt"

On 3/21/14 8:37 PM, Brian Stempin wrote:

Hi Costin,
Thanks for the response -- that sounds like what I need.  I re-ran my job with 
a mapping that included
dynamic_templates, and I'm still having issues.  Here's my mapping...am I using 
dynamic_templates correctly?

Thanks for all of your help so far,
Brian

curl -XPUT 'http://localhost:9200/dau/log/_mapping?pretty' -d '
{
 "log" : {
 "properties" : {
"csUriParams" : {
 "type" : "object",
 "dynamic_templates" : [{
 "defualt" : {
 "match" : "*",
 "match_mapping_type" : "string",
 "mapping" : {
 "type" : "string",
 "index" : "not_analyzed"
 }
 }
 }],
 "properties" : {
"aoi" : {"type" : "date", "ignore_malformed" : "true"},
"zoneid" : {"type" : "long", "ignore_malformed" : "true"},
"ftr" : {"type" : "date", "ignore_malformed" : "true"},
"pid" : {"type" : "string", "index" : "not_analyzed"},
"cid" : {"type" : "string", "index" : "not_analyzed"},
"ext" : {"type" : "string", "index" : "not_analyzed"},
 "systemid" : {"type" : "string", "index" : "not_analyzed"}
}
 },
"rawLog" : {
 "type" : "object",
 "dynamic_templates" : [{
 "defualt" : {
 "match" : "*",
 "match_mapping_type" : "string",
 "mapping" : {
 "type" : "string",
 "index" : "not_analyzed"
 }
 }
 }],
 "properties" : {
 "datetime" : {"type" : "date", "format" : "date_hour_minute_second", 
"ignore_malformed" : "true"},
"cs(User-Agent)" : {"type" : "string", "index" : 
"not_analyzed"},
 "cs-ip" : {"type" : "string", "index" : "not_analyzed"},
 "cs-uri" : {"type" : "string"},
 "os" : {"type" : "string", "index" : "not_analyzed"},
 "browser" : {"type" : "string", "index" : "not_analyzed"}
}
 }
 }
 }
}
'


On Thu, Mar 20, 2014 at 2:12 PM, Costin Leau mailto:costin.l...@gmail.com>> wrote:

Sure - take a look at dynamic_templates - you define one under your 
index/type and specify the match for your field.
You can either define the mapping for the fields that you want and leave 
the so-called catch-all (*) directive last
or, if you have type of naming convention, use that:


http://www.elasticsearch.org/__guide/en/elasticsearch/__reference/current/mapping-__root-object-type.html#___dynamic_templates





On 3/20/14 6:32 PM, Brian Stempin wrote:

That's the problem -- it's a web log that contains a URL that could 
have literally anything in it.  Anyone could
put a
base64 value as a random query parameter.  I could have the M/R job 
ignore all fields that I don't explicitly
expect,
but that's not very flexible and prevents me from spotting possible 
abuse or user-error.  Is there any way for me to
disable ES's type-guessing or to provide a default guess?  I'd rather 
have ES default to a string than to fail a
M/R job
because its type-guess was wrong.

Brian


On Thu, Mar 20, 2014 at 12:26 PM, Costin Leau mailto:costin.l...@gmail.com>
>__> wrote:

 Then what you could do is to minimize the bulk size to say 100 
documents, turn on logging and run your data
through.
 This way you can catch the 'special' document in the act.

 As for expectations - Elasticsearch tries to guess the field type 
by looking at its value - it seems the base64
 entry looks like a date, hence the error. You can avoid this by 
defining the field (either directly or
through a
 template) in your mapping so it always gets mapped to a string.
 As a rule of thumb, whenever you want full control over the index, 
mapping is the way to do it.



 On 3/20/14 6:10 PM, Brian Stempin wrote:

 I have unit tests for this MR job, and they show that the JSON 
output is a string as I'd expect, so
Gson is most
 likely
 not the cause.

 I'm hesitant to show more code (owned by the work-place), but 
I can describe it a little bit further:

* The mapper gets a W3C log entry

Re: Getting [marvel.agent.exporter ] [Tempest] remote target didn't respond with 200 OK response code [404 Not Found

2014-03-21 Thread Boaz Leskes

Hi Dennis,

Do you have action.auto_create_index: false in your elasticsearch.yml? if 
so 
 see http://www.elasticsearch.org/guide/en/marvel/current/#relevant-settings

Cheers,
Boaz

On Thursday, March 20, 2014 1:35:05 AM UTC+1, Dennis Andersen wrote:
>
> My two clusters, currently default named, Tempest and Morg, appear to be 
> working fine. The head plugin shows green and queries work. I have indices 
> and documents added.
>
> When I added Marvel, restarted both nodes, I see the error
>  [marvel.agent.exporter] [Tempest] remote target didn't respond with 
> 200 OK response code [404 Not Found
> repeated in each cluster's log files (with the names changed accordingly).
>
> Marvel has a blank screen and reports "No results There were no results 
> because no indices were found that match your selected time span".
>
> I am running version 1.0.1 on each server, have Java (build 1.7.0_25-b15) 
> on each. No configuration changes were made -- meaning no marvel.agent 
> properties were set in the configuration.
> I do have 
> discovery.zen.ping.multicast.enabled: false
>
> Any suggestions?
>
> thanks,
>
> Dennis
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/7dbc30b6-0c5f-4776-b550-d842a8b4bda2%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Logstash logs

2014-03-21 Thread Ivan Brusic

I do not think there is a default log level in logstash. You can either
exclude events on the client or on the server side.

On the server side, you can simply apply a drop filter to your input:
http://logstash.net/docs/1.4.0/filters/drop

On the client side, it all depends on your application. For my
elasticsearch servers, I use the syslog appender in log4j, so the log level
is determined by the log4j configuration. For other servers, I use logstash
agents (still have to transition to the logstash forwarder), so I would
need to apply the same drop filter in the output.

Perhaps there is a default log level and someone can enlighten us both.

Cheers,

Ivan

On Fri, Mar 21, 2014 at 5:33 AM, Arik Gaisler  wrote:

> What is the default log level in logstash? how do I run with log level set
> to error (meaning only errors shall be logged)
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/ed8186e0-c413-44ff-b775-8868c790f70b%40googlegroups.com
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQAM8XEJ5j6Yo8Rx9dxVouXeO5G3te8eLFNqNoy4Zsx8EA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: Java Serialization of Exceptions

2014-03-21 Thread Chris Berry

If it happened once, then by definition it will happen again. History repeats 
itself. ;-)

What exactly would you lose?
You are simply trading one rigid serialization scheme for another more lenient 
one.
Yes, you would have to introduce something like Jackson's Object Mapper, but 
that seems to be the defacto standard today and with your use of the Shade 
Plugin it wouldn't really be a burden on the Client anyway.

With all due respect, you may be trivializing the impact of this one time bug.
It is difficult, at best, to inform all the Clients of your Cluster; "Hey, if 
you want to see what your Exceptions really are, then upgrade your JVM" 
Especially in large SOA shops

This just decouples the Client and Server deployments.

Thanks much,
-- Chris 

On Mar 21, 2014, at 12:18 PM, kimchy  wrote:

> I wonder why you are asking for this feature? If its because Java broke 
> backward comp. on serialization of InetAddress that we use in our exceptions, 
> then its a bug in Java serialization, hard for us to do something about it. 
> 
> You will loose a lot by trying to serialize exceptions using JSON, and we 
> prefer not to introduce dependency on ObjectMapper in Jackson, or try and 
> serialize exceptions using Jackson.
> 
> I would be very careful in introducing this just because of a (one time bug) 
> in Java.
> 
> On Friday, March 21, 2014 5:18:38 PM UTC+1, Chris Berry wrote:
> Greetings,
> 
> Let me say up-front, I am a huge fan and proponent of Elasticsearch. It is a 
> beautiful tool.
> 
> So, that said, it surprises me that Elasticsearch has such a pedestrian flaw, 
> and serializes it's Exceptions using Java Serialization.
> In a big shop it is quite difficult (i.e. next to impossible) to keep all the 
> ES Clients on the same exact JVM as Elasticsearch, and thus, it is not 
> uncommon to get TransportSerializationExceptions instead of the actual 
> underlying problem.
> I was really hoping this would be corrected in ES 1.0.X, but no such luck. 
> (As far as I can tell...)
> 
> It seems that this is pretty easily fixed?
> Just switch to a JSON representation of the basic Exception and gracefully 
> (forwards-compatibly) attempt to re-hydrate the actual Exception class. 
> You'd just have to drop an additional "header" in the stream that tells the 
> code it is a JSON response and route to the right Handler it accordingly. If 
> the header is missing, then do things the old way with Java Serialization??
> 
> Are there any plans to fix this? Or perhaps to entertain a Pull Request?
> It may seem just an annoyance, but it is actually pretty bad, in that it 
> keeps Clients from seeing their real issues. Especially in shops where it is 
> difficult to see the Production logs of Elasticsearch itself. 
> 
> Thanks much,
> -- Chris 
> 
> 
> 
> -- 
> You received this message because you are subscribed to a topic in the Google 
> Groups "elasticsearch" group.
> To unsubscribe from this topic, visit 
> https://groups.google.com/d/topic/elasticsearch/7bpam7mWjY8/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to 
> elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/elasticsearch/6ae5f173-a2b4-435c-8e5d-a43d377e2fb0%40googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/DF3191F4-8B07-4DCB-920A-AFD35F2FECF1%40gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: EC2 Discovery

2014-03-21 Thread IronMan2014

Ok, we are seeing this error in the log, any clues?

Error injecting constructor, java.lang.IllegalStateException: This is a 
proxy used to support circular references involving constructors. The 
object we're proxying is not constructed yet. Please wait until after 
injection has completed to use this object.
  at org.elasticsearch.cache.NodeCache.(Unknown Source)
  while locating org.elasticsearch.cache.NodeCache
Caused by: java.lang.IllegalStateException: This is a proxy used to support 
circular references involving constructors.


On Thursday, March 20, 2014 3:18:11 PM UTC-4, IronMan2014 wrote:
>
> I can't seem to make my EC2 cluster of 2 nodes/instances work.
>
> - If I comment out the section below, I can connect to each instance 
> individually and query it, I am using Sense plugin to query Elastic Search 
> Instance.
>
> - With the sections included as below, I cannot query neither node, I get 
> "Request failed to get to the server (status code: 0):"
>
>
>  In /instance 1/ elasticsearch.yml
> cluster.name: mycluster
> node.name: "node_1"
>
> cloud:
>
>aws:
>
>access_key: 
>
>secret_key: 
>
> discovery:
>
>type: ec2
>
>  In /instance 2/ elasticsearch.yml
> cluster.name: mycluster
> node.name: "node_2"
>
> cloud:
>
>   aws:
>
>   access_key: 
>
>   secret_key: 
>
> discovery:
>
>   type: ec2
>
>
> I also tried with:
>
> discovery.zen.ping.multicast.enabled: false
>
> discovery.zen.ping.unicast.hosts: ["Instance_1 IP:9300", "instance_2 
> IP:9300"]
>
>
> Here is "mycluster.log" from both instances:
>
> $ more /var/log/elasticsearch/mycluster.log 
>
> [2014-03-20 19:00:37,635][INFO ][node ] [node_1]version
> [0.90.10], pid[3520], build[0a5781f/2014-01-10T10:18:37Z]
>
> [2014-03-20 19:00:37,635][INFO ][node ] 
> [node_1]initializing 
> ...
>
> [2014-03-20 19:00:37,698][INFO ][plugins  ] [node_1]loaded 
> [mapper-attachments, cloud-aws], sites []
>
> [2014-03-20 19:01:17,898][INFO ][node ] [node_1]version
> [0.90.10], pid[3594], build[0a5781f/2014-01-10T10:18:37Z]
>
> [2014-03-20 19:01:17,898][INFO ][node ] 
> [node_1]initializing 
> ...
>
> [2014-03-20 19:01:17,961][INFO ][plugins  ] [node_1]loaded 
> [mapper-attachments, cloud-aws], sites []
>
> [2014-03-20 19:03:50,048][INFO ][node ] [node_1]version
> [0.90.10], pid[3671], build[0a5781f/2014-01-10T10:18:37Z]
>
> [2014-03-20 19:03:50,048][INFO ][node ] 
> [node_1]initializing 
> ...
>
> [2014-03-20 19:03:50,111][INFO ][plugins  ] [node_1]loaded 
> [mapper-attachments, cloud-aws], sites []
>
>
> $more /var/log/elasticsearch/mycluster.log 
>
> [2014-03-20 19:00:29,465][INFO ][node ] [node_2]version
> [0.90.10], pid[2800], build[0a5781f/2014-01-10T10:18:37Z]
>
> [2014-03-20 19:00:29,466][INFO ][node ] 
> [node_2]initializing 
> ...
>
> [2014-03-20 19:00:29,528][INFO ][plugins  ] [node_2]loaded 
> [mapper-attachments, cloud-aws], sites []
>
> [2014-03-20 19:01:26,645][INFO ][node ] [node_2]version
> [0.90.10], pid[2874], build[0a5781f/2014-01-10T10:18:37Z]
>
> [2014-03-20 19:01:26,646][INFO ][node ] 
> [node_2]initializing 
> ...
>
> [2014-03- ...

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/0ed56a05-1538-4a4d-91d7-8a9880ef7508%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Help setting up logging

2014-03-21 Thread Raphael Miranda

Thank you very much. 

I figured the name slowlog meant it was verbose and not directives to log 
actions that surpass a given threshold. I lowered the config to 1ms and now 
I can see the logs. 

On Thursday, 20 March 2014 22:56:36 UTC-3, Ivan Brusic wrote:
>
> The logging configuration specifies how and what to log, but it does not 
> specify when or what actually constitutes a slow query/index. Not all 
> queries/index requests are logged, just the slow ones. You need to define 
> the threshold in the main elasticsearch.yml config file.
>
>
> http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/index-modules-slowlog.html
>
> -- 
> Ivan
>
>
> On Thu, Mar 20, 2014 at 6:05 PM, Raphael Miranda 
> 
> > wrote:
>
>> ES is creating the log files upon startup but they are empty? I switched 
>> every log level to DEBUG and it started pouring more log into 
>> elasticsearch.log still, no query or indexing is logged.
>>
>> -rw-r--r-- 1 elasticsearch elasticsearch 0 Mar 21 00:54 
>> elasticsearch_index_indexing_slowlog.log
>> -rw-r--r-- 1 elasticsearch elasticsearch 0 Mar 21 00:54 
>> elasticsearch_index_search_slowlog.log
>> -rw-r--r-- 1 elasticsearch elasticsearch 83910 Mar 21 01:00 
>> elasticsearch.log
>>
>> Heres my logging.yml
>>
>> # you can override this using by setting a system property, for example 
>> -Des.logger.level=DEBUG
>> es.logger.level: DEBUG
>> rootLogger: ${es.logger.level}, console, file
>> logger:
>>   # log action execution errors for easier debugging
>>   action: DEBUG
>>   # reduce the logging for aws, too much is logged under the default INFO
>>   com.amazonaws: DEBUG
>>
>>   # gateway
>>   gateway: DEBUG
>>   index.gateway: DEBUG
>>
>>   # peer shard recovery
>>   indices.recovery: DEBUG
>>
>>   # discovery
>>   discovery: DEBUG
>>
>>   index.search.slowlog: DEBUG, index_search_slow_log_file
>>   index.indexing.slowlog: DEBUG, index_indexing_slow_log_file
>>
>> additivity:
>>   index.search.slowlog: true
>>   index.indexing.slowlog: true
>> appender:
>>   console:
>> type: console
>> layout:
>>   type: consolePattern
>>   conversionPattern: "[%d{ISO8601}][%-5p][%-25c] %m%n"
>>
>>   file:
>> type: dailyRollingFile
>> file: ${path.logs}/${cluster.name}.log
>> datePattern: "'.'-MM-dd"
>> layout:
>>   type: pattern
>>   conversionPattern: "[%d{ISO8601}][%-5p][%-25c] %m%n"
>>
>>   index_search_slow_log_file:
>> type: dailyRollingFile
>> file: ${path.logs}/${cluster.name}_index_search_slowlog.log
>> datePattern: "'.'-MM-dd"
>> layout:
>>   type: pattern
>>   conversionPattern: "[%d{ISO8601}][%-5p][%-25c] %m%n"
>>
>>   index_indexing_slow_log_file:
>> type: dailyRollingFile
>> file: ${path.logs}/${cluster.name}_index_indexing_slowlog.log
>> datePattern: "'.'-MM-dd"
>> layout:
>>   type: pattern
>>   conversionPattern: "[%d{ISO8601}][%-5p][%-25c] %m%n"
>>  
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to elasticsearc...@googlegroups.com .
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/39a035f9-752c-47a5-9a5c-61d4aeb643ee%40googlegroups.com
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/41599209-093b-40d9-890b-90071d4c17e8%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Java Serialization of Exceptions

2014-03-21 Thread kimchy

I wonder why you are asking for this feature? If its because Java broke 
backward comp. on serialization of InetAddress that we use in our 
exceptions, then its a bug in Java serialization, hard for us to do 
something about it. 

You will loose a lot by trying to serialize exceptions using JSON, and we 
prefer not to introduce dependency on ObjectMapper in Jackson, or try and 
serialize exceptions using Jackson.

I would be very careful in introducing this just because of a (one time 
bug) in Java.

On Friday, March 21, 2014 5:18:38 PM UTC+1, Chris Berry wrote:
>
> Greetings,
>
> Let me say up-front, I am a huge fan and proponent of Elasticsearch. It is 
> a beautiful tool.
>
> So, that said, it surprises me that Elasticsearch has such a pedestrian 
> flaw, and serializes it's Exceptions using Java Serialization.
> In a big shop it is quite difficult (i.e. next to impossible) to keep all 
> the ES Clients on the same exact JVM as Elasticsearch, and thus, it is not 
> uncommon to get TransportSerializationExceptions instead of the actual 
> underlying problem.
> I was really hoping this would be corrected in ES 1.0.X, but no such luck. 
> (As far as I can tell...)
>
> It seems that this is pretty easily fixed?
> Just switch to a JSON representation of the basic Exception and gracefully 
> (forwards-compatibly) attempt to re-hydrate the actual Exception class. 
> You'd just have to drop an additional "header" in the stream that tells 
> the code it is a JSON response and route to the right Handler it 
> accordingly. If the header is missing, then do things the old way with Java 
> Serialization??
>
> Are there any plans to fix this? Or perhaps to entertain a Pull Request?
> It may seem just an annoyance, but it is actually pretty bad, in that it 
> keeps Clients from seeing their real issues. Especially in shops where it 
> is difficult to see the Production logs of Elasticsearch itself. 
>
> Thanks much,
> -- Chris 
>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/6ae5f173-a2b4-435c-8e5d-a43d377e2fb0%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Elasticsearch configuration for uninterrupted indexing

2014-03-21 Thread Ivan Brusic

One of the main usage of having a data-less node is that it would act as a
coordinator between the other nodes. It will gather all the responses from
the other nodes/shards and reduce them into one.

In your case, the data-less node is gathering all the data from just one
node. In other words, it is not doing much since the reduce phase is
basically a pass-thru operation. With a two node cluster, I would say you
are better off having both machines act as full nodes.

Cheers,

Ivan



On Fri, Mar 21, 2014 at 5:04 AM, Rujuta Deshpande  wrote:

> Hi,
>
> I am setting up a system consisting of elasticsearch-logstash-kibana for
> log analysis. I am using one machine (2 GB RAM, 2 CPUs) running logstash,
> kibana and  two instances of elasticsearch. Two other machines, each
> running  logstash-forwarder are pumping logs into the ELK system.
>
> The reasoning behind using two ES instances was this - I needed one
> uninterrupted instance to index the incoming logs and I also needed to
> query the currently existing indices. However, I didn't want any complex
> querying to result in loss of events owing to Out of Memory Errors because
> of excessive querying.
>
> So, one elasticsearch node was master = true  and data = true which did
> the indexing (called the writer node) and the other node, was master =
> false and data = false (this was the workhorse or reader node) .
>
> I assumed that, in cases of excessive querying, although the data is
> stored on the writer node, the reader node will query the data and all the
> processing will take place on the reader as a result of which issues like
> out of memory error etc will be avoided and uninterrupted indexing will
> take place.
>
> However, while testing this, I realized that the reader hardly uses the
> heap memory ( Checked this in Marvel )  and when I fire a complex search
> query - which was a search request using the python API where the 'size'
> parameter was set to 1, the writer node throws an out of memory error,
> indicating that the processing also takes place on the writer node only. My
> min and max heap size was set to 256m  for this test. I also ensured that I
> was firing the search query to the port on which the reader node was
> listening (Port 9200). The writer node was running on Port 9201.
>
> Was my previous understanding of the problem incorrect - i.e. having one
> reader and one writer node, doesn't help in uninterrupted indexing of
> documents? If this is so, what is the use of having a separate workhorse or
> reader node?
>
> My eventual aim is to be able to query elasticsearch and fetch large
> amounts of data at a time without interrupting/slowing down the indexing of
> documents.
>
> Thank you.
>
> Rujuta
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/a8fcd5f0-447a-4654-9115-9bc4e524b246%40googlegroups.com
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQD25ipp5UFihLDqcqxqr1_4nMvngsNmedA73gLfjG_rcQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: Optimal number of Shards per node

2014-03-21 Thread Rajan Bhatt

Thanks Zack.

So on single node this test will tell us how much single node with Single 
shard can get us. Now if we want to deploy more shards/per node then we 
need take into consideration, that more shard/per node would consume more 
resources ( File Descriptor, Memory, etc..) and performance would degrade 
as more shards are added to node.

This is tricky and milage can vary with different work load ( Indexing + 
Searching ) ..

I am not sure you would be able to describe at very high level your 
deployment ( Number of ES nodes + number of Index + Shards + Replica ) to 
get some idea..
I appreciate your answer and your time.

btw,which tool you use for monitoring ES cluster and what you monitor ?
Thanks
Rajan

On Thursday, March 20, 2014 2:05:52 PM UTC-7, Zachary Tong wrote:
>
> Unfortunately, there is no way that we can tell you an optimal number. 
>  But there is a way that you can perform some capacity tests, and arrive at 
> usable numbers that you can extrapolate from.  The process is very simple:
>
>
>- Create a single index, with a single shard, on a single 
>production-style machine
>- Start indexing *real, production-style *data.  "Fake" or "dummy" 
>data won't work here, it needs to mimic real-world data
>- Periodically, run real-world queries that you would expect users to 
>enter
>- At some point, you'll find that performance is no longer acceptable 
>to you.  Perhaps the indexing rate becomes too slow.  Or perhaps query 
>latency is too slow.  Or perhaps your node just runs out of memory
>- Write down the number of documents in the shard, and the physical 
>size of the shard
>
> Now you know the limit of a single shard given your hardware + queries + 
> data.  Using that knowledge, you can extrapolate given your expected 
> search/indexing load, and how many documents you expect to index over the 
> next few years, etc.
>
> -Zach
>
>
>
> On Thursday, March 20, 2014 3:29:47 PM UTC-5, Rajan Bhatt wrote:
>>
>> Hello,
>>
>> I would appreciate if someone can suggest optimal number of shards per ES 
>> node for optimal performance or any recommended way to arrive at number of 
>> shards given number of core and memory foot print.
>>
>> Thanks in advance
>> Reagards
>> Rajan
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/658c8f7d-071b-46c8-b80b-3d0660e7889e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Java Serialization of Exceptions

I would be very interested in that pull request, too.

Changing every exception transport to a textual JSON error seems a proper
alternative. I haven't tried Jackson ObjectMapper on the exception classes
that are present in ES but it should be possible.

Jörg


On Fri, Mar 21, 2014 at 5:18 PM, Chris Berry  wrote:

> Greetings,
>
> Let me say up-front, I am a huge fan and proponent of Elasticsearch. It is
> a beautiful tool.
>
> So, that said, it surprises me that Elasticsearch has such a pedestrian
> flaw, and serializes it's Exceptions using Java Serialization.
> In a big shop it is quite difficult (i.e. next to impossible) to keep all
> the ES Clients on the same exact JVM as Elasticsearch, and thus, it is not
> uncommon to get TransportSerializationExceptions instead of the actual
> underlying problem.
> I was really hoping this would be corrected in ES 1.0.X, but no such luck.
> (As far as I can tell...)
>
> It seems that this is pretty easily fixed?
> Just switch to a JSON representation of the basic Exception and gracefully
> (forwards-compatibly) attempt to re-hydrate the actual Exception class.
> You'd just have to drop an additional "header" in the stream that tells
> the code it is a JSON response and route to the right Handler it
> accordingly. If the header is missing, then do things the old way with Java
> Serialization??
>
> Are there any plans to fix this? Or perhaps to entertain a Pull Request?
> It may seem just an annoyance, but it is actually pretty bad, in that it
> keeps Clients from seeing their real issues. Especially in shops where it
> is difficult to see the Production logs of Elasticsearch itself.
>
> Thanks much,
> -- Chris
>
>
>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/1e02778a-a1d0-44b5-8b1f-e8de7de33668%40googlegroups.com
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoFxv%3D%3DNFxXkLaT7%2BKcuBApO0NZfwN77mi_BS_BhNeG0tA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Java Serialization of Exceptions

2014-03-21 Thread Chris Berry

Greetings,

Let me say up-front, I am a huge fan and proponent of Elasticsearch. It is
a beautiful tool.

So, that said, it surprises me that Elasticsearch has such a pedestrian
flaw, and serializes it's Exceptions using Java Serialization.
In a big shop it is quite difficult (i.e. next to impossible) to keep all
the ES Clients on the same exact JVM as Elasticsearch, and thus, it is not
uncommon to get TransportSerializationExceptions instead of the actual
underlying problem.
I was really hoping this would be corrected in ES 1.0.X, but no such luck.
(As far as I can tell...)

It seems that this is pretty easily fixed?
Just switch to a JSON representation of the basic Exception and gracefully
(forwards-compatibly) attempt to re-hydrate the actual Exception class.
You'd just have to drop an additional "header" in the stream that tells the
code it is a JSON response and route to the right Handler it accordingly.
If the header is missing, then do things the old way with Java
Serialization??

Are there any plans to fix this? Or perhaps to entertain a Pull Request?
It may seem just an annoyance, but it is actually pretty bad, in that it
keeps Clients from seeing their real issues. Especially in shops where it
is difficult to see the Production logs of Elasticsearch itself.

Thanks much,
-- Chris

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/1e02778a-a1d0-44b5-8b1f-e8de7de33668%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Resetting Kibana Dashboard fields

2014-03-21 Thread Terry Healy

Hi- I'm using Kibana 3 to dig through ES logs that are of several types: 
email logs and several type of IDS logs which have been parsed and inserted 
in ES. My problem is that fields selected for one type are always visible - 
even when displaying other types where the fields are not present. The 
"All" display shows (1), and "Current" shows (32) - which is correct for 
the selected type (mail). But 4 columns are present in the table display 
that do no exist in a mail record, and they take up most of the display. 

See attached partial screen grab, where I want to get rid of the 
"category", "code", "description", and "srcIp" columns.

How can I get rid of these unrelated fields?

Thanks

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/1b095c21-c002-460f-960b-46105d7bc466%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
<>

Re: EC2 Discovery

2014-03-21 Thread ZenMaster80

I am not sure if I missed something, but what you mentioned I believe I 
already tried as showing in my original post.
I can connect from one instance to another.
I can connect to each machine individually and I am able to index and query 
it fine with default configuration without any zen or ec2 settings. But, 
when I turned them on like I show on the post, I get this  "Request failed 
to get to the server (status code: 0):" when trying to query the instance, 
and when I do this, it won't even log anything, it is not getting that far.


On Friday, March 21, 2014 4:46:40 AM UTC-4, Norberto Meijome wrote:
>
> Don't try ec2 discovery until you have tested that:
> - you can connect from one machine to another on port 9300 ( nc as client 
> and server, basic networking/ firewalling)
> - run a simple aws ec2 describe instances call with the API key you plan 
> to use, and you can see the machines you need there. Bonus points for 
> filtering based on the rules you intense to use ( sec group, tags). This is 
> to ensure your API keys have the correct access needed.
>
> Once you have those basic steps working, use them on es config.
>
> Make sure you enable ec2 discovery and disable the zen discovery ( it will 
> run first and likely time out and ec2 disco won't get to exec). 
>
> The other thing to watch out for is contacting nodes which are too busy to 
> ack your new nodes request for cluster info...but that would be a problem 
> with zen disco too.
> On 21/03/2014 12:31 PM, "Raphael Miranda" > 
> wrote:
>
>> are both machines in the same security group?
>>
>> --
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to elasticsearc...@googlegroups.com .
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/eb8bb939-3b9d-4f5b-a45c-3d529f75983e%40googlegroups.com
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/48571118-fd84-45da-9aaf-0314c936336b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Indexisto.com - hosted elastic search with admin panel and JS search string for sites

2014-03-21 Thread Mattias Nordberg

Hi Julie,

Facetflow (https://facetflow.com/) is a hosted search solution (based on 
Elasticsearch) which supports Azure. 

Generally you would have your website send the data over the Elasticsearch 
API to have it indexed by the search engine. We do however support:
- Synonyms
- Multilingual indices by having the language analyser specified on 
document level
- Variable term weighting by boosting terms in searches, on field level if 
needed
- Multi-tenancy by creating multiple indices (one per e-commerce site for 
example) or scaling one index by sharding the data per customer

Let me know if you have any further questions.

Best regards,
Mattias

On Friday, July 19, 2013 5:02:33 AM UTC+2, Julie Setbon wrote:
>
> Hi Andrew, congrats on the new service. I'm interested in hearing about 
> Azure solutions for faceted search where . Are you supporting Azure   ? Is 
> a back-end (database) necessary from the start to make it work or is it 
> possible to use it with flat files (html, css, js) built in Zurb Foundation 
> 4 ?
>
> I'm interested in the following within an Azure environment:
>
> - support for thesaurus and multi-word equivalences
> -support for RSS feeds of changes
> -single multilingual index without a need to modify the architecture
> - variable term weighting that can also apply to fields
> - multi-tenant ecommerce scenario (potential for future)
>  
> Julie
>
>
>
>> 
>>
>>
>>
>> 
>>
>>
>>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/d4a1344d-0a88-42f1-9e0d-c4ae4721b168%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Indexing m:n relations

2014-03-21 Thread Pat

Hi,

I would like to index documents of type A and B that have an m:n relation
between them (indexing 1:n is pretty straightforward using parent/child
relationships or nested documents).
For example, I would like to find all documents of type A that have a
related document B which itself has a property x = 123. Currently I nest B
within A, but every time a specific B object is referenced by an A object,
it gets stored in the index again as a seperate document (plus all
documents that B references itself). Since the data model contains many
relations, the index gets very large and building queries spanning the
nested structures is rather complex and error-prone.
I really would like to index A and B as separate documents which contain a
reference to each other. But this makes something like a filtered query
across different document types necessary.
Something like "Search for all Bs that have x = 123 and use the results as
a filter on A". I could do that within my application, but I would need to
retrieve ALL hits for the first query on B (since the results need to be
sorted on some property of A) and use the results to build a filter for the
second query on A. Two queries and probably lots of data exchanged between
my application and Elasticsearch.
Is there a simpler solution for this?

Regards,
Pat

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/0d9877f0-ed40-4606-ba37-85b2998ee907%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: EC2 Discovery

2014-03-21 Thread ZenMaster80

I am not sure if I missed something, but what you mentioned I believe I 
already tried as showing in my original post.
I can connect to each machine individually and I am able ti index and query 
it fine with default configuration without any zen or ec2 settings. But, 
when I turned them on like I show on the post, I get this  "Request failed 
to get to the server (status code: 0):" when trying to query the instance.
Did you mean I should try to see if I can access one instance from the 
other? This I didn't try yet.

On Friday, March 21, 2014 4:46:40 AM UTC-4, Norberto Meijome wrote:
>
> Don't try ec2 discovery until you have tested that:
> - you can connect from one machine to another on port 9300 ( nc as client 
> and server, basic networking/ firewalling)
> - run a simple aws ec2 describe instances call with the API key you plan 
> to use, and you can see the machines you need there. Bonus points for 
> filtering based on the rules you intense to use ( sec group, tags). This is 
> to ensure your API keys have the correct access needed.
>
> Once you have those basic steps working, use them on es config.
>
> Make sure you enable ec2 discovery and disable the zen discovery ( it will 
> run first and likely time out and ec2 disco won't get to exec). 
>
> The other thing to watch out for is contacting nodes which are too busy to 
> ack your new nodes request for cluster info...but that would be a problem 
> with zen disco too.
> On 21/03/2014 12:31 PM, "Raphael Miranda" > 
> wrote:
>
>> are both machines in the same security group?
>>
>> --
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to elasticsearc...@googlegroups.com .
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/eb8bb939-3b9d-4f5b-a45c-3d529f75983e%40googlegroups.com
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/7c9e9da8-6efe-4005-8a69-c00daa6ec711%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Fast vector highlighter does not work with explicit span_near queries

2014-03-21 Thread Harry Waye

FYI this is ES 1.0.1

On Friday, March 21, 2014 1:00:33 PM UTC, Harry Waye wrote:
>
> I'm trying to use fvh with span_near queries but it appears to be totally 
> broken.  Other query types work, even it's query_string equivalent.  Is 
> there anything I am doing incorrectly here?  Or is there a work around that 
> I can employ in the meantime?  Below is a recreation:
>
> # Set up index with mappings
> curl -XPOST localhost:9200/a -d '{
>   "mappings": {
> "document": {
>   "properties": {
> "text": {
>   "type": "string", 
>   "term_vector": "with_positions_offsets"
> }
>   }
> }
>   }
> }'
>  # Put text to field with positions offsets
> curl -XPOST localhost:9200/a/document/1 -d '{"text": "a b"}'
>  # Query with fvh highlighter gives no highlight
> curl -XPOST localhost:9200/a/document/_search -d '{
>   "query": {
> "span_near": {
>   "slop": 0, 
>   "clauses": [{"span_term": {"text": "a"}}, {"span_term": {"text": "b"}}]
> }
>   }, 
>   "highlight": {"fields": {"text": {"type":"fvh"}}}
> }'
>  # 
> {"took":3,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":1,"max_score":0.22145195,"hits":[{"_index":"a","_type":"document","_id":"1","_score":0.22145195,
>  "_source" : {"text": "a b"}}]}}
>  # Query with plain
> curl -XPOST localhost:9200/a/document/_search -d '{
>   "query": {
> "span_near": {
>   "slop": 0,
>   "clauses": [{"span_term": {"text": "a"}}, {"span_term": {"text": "b"}}]
> }
>   },
>   "highlight": {"fields": {"text": {"type":"plain"}}}
> }'
>  # 
> {"took":3,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":1,"max_score":0.22145195,"hits":[{"_index":"a","_type":"document","_id":"1","_score":0.22145195,
>  "_source" : {"text": "a b"},"highlight":{"text":["a 
> b"]}}]}}
>  curl -XPOST localhost:9200/a/document/_search -d '{
>   "query": {
> "query_string": {
>   "query": "\"a b\"~0",
>   "default_field": "text"
> }
>   },
>   "highlight": {"fields": {"text": {"type":"fvh"}}}
> }'
>  # 
> {"took":3,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":1,"max_score":0.38356602,"hits":[{"_index":"a","_type":"document","_id":"1","_score":0.38356602,
>  "_source" : {"text": "a b"},"highlight":{"text":["a b"]}}]}}
>  # Try a match query
> curl -XPOST localhost:9200/a/document/_search -d '{
>   "query": {
> "match": {
>   "text": "a b"
> }
>   },
>   "highlight": {"fields": {"text": {"type":"fvh"}}}
> }'
>  # 
> {"took":14,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":1,"max_score":0.2712221,"hits":[{"_index":"a","_type":"document","_id":"1","_score":0.2712221,
>  "_source" : {"text": "a b"},"highlight":{"text":["a 
> b"]}}]}}
>
>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/0b86427d-1033-493b-a874-0411f3b77ec4%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Fast vector highlighter does not work with explicit span_near queries

2014-03-21 Thread Harry Waye

I'm trying to use fvh with span_near queries but it appears to be totally 
broken.  Other query types work, even it's query_string equivalent.  Is 
there anything I am doing incorrectly here?  Or is there a work around that 
I can employ in the meantime?  Below is a recreation:

# Set up index with mappings
curl -XPOST localhost:9200/a -d '{
  "mappings": {
"document": {
  "properties": {
"text": {
  "type": "string", 
  "term_vector": "with_positions_offsets"
}
  }
}
  }
}'
 # Put text to field with positions offsets
curl -XPOST localhost:9200/a/document/1 -d '{"text": "a b"}'
 # Query with fvh highlighter gives no highlight
curl -XPOST localhost:9200/a/document/_search -d '{
  "query": {
"span_near": {
  "slop": 0, 
  "clauses": [{"span_term": {"text": "a"}}, {"span_term": {"text": "b"}}]
}
  }, 
  "highlight": {"fields": {"text": {"type":"fvh"}}}
}'
 # 
{"took":3,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":1,"max_score":0.22145195,"hits":[{"_index":"a","_type":"document","_id":"1","_score":0.22145195,
 "_source" : {"text": "a b"}}]}}
 # Query with plain
curl -XPOST localhost:9200/a/document/_search -d '{
  "query": {
"span_near": {
  "slop": 0,
  "clauses": [{"span_term": {"text": "a"}}, {"span_term": {"text": "b"}}]
}
  },
  "highlight": {"fields": {"text": {"type":"plain"}}}
}'
 # 
{"took":3,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":1,"max_score":0.22145195,"hits":[{"_index":"a","_type":"document","_id":"1","_score":0.22145195,
 "_source" : {"text": "a b"},"highlight":{"text":["a b"]}}]}}
 curl -XPOST localhost:9200/a/document/_search -d '{
  "query": {
"query_string": {
  "query": "\"a b\"~0",
  "default_field": "text"
}
  },
  "highlight": {"fields": {"text": {"type":"fvh"}}}
}'
 # 
{"took":3,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":1,"max_score":0.38356602,"hits":[{"_index":"a","_type":"document","_id":"1","_score":0.38356602,
 "_source" : {"text": "a b"},"highlight":{"text":["a b"]}}]}}
 # Try a match query
curl -XPOST localhost:9200/a/document/_search -d '{
  "query": {
"match": {
  "text": "a b"
}
  },
  "highlight": {"fields": {"text": {"type":"fvh"}}}
}'
 # 
{"took":14,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":1,"max_score":0.2712221,"hits":[{"_index":"a","_type":"document","_id":"1","_score":0.2712221,
 "_source" : {"text": "a b"},"highlight":{"text":["a b"]}}]}}



-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/bc11d0c7-119b-410d-9fb4-ee4c72c6ee5a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Using ES for web analytics

2014-03-21 Thread Jan van Vlimmeren

And of course then you come up with the answer and you feel like a complete 
idiot.  Just in case anyone else runs into the issue:

You need to specify in the mapping that the URL field should be 
not_analyzed ("index": "not_analyzed")

On Friday, March 21, 2014 1:33:27 PM UTC+1, Jan van Vlimmeren wrote:
>
> Hi everyone,
>
> I'm brand new to ES and am trying to use it to create a basic analytics 
> app.
>
> I'm running into a problem that I can't seem to get sorted out myself:
>
> I have documents like this:
> {
> _index: hpstats
> _type: articles
> _id: http://www.standaard.be/cnt/dmf20140321_01034888-2014-03-21-11-07
> _version: 10
> _score: 1
> _source: {
> url: http://www.standaard.be/cnt/dmf20140321_01034888
> count: 2
> title: Busverkeer Vlaams-Brabant verstoord door staking in Asse
> created: 2014-03-21T11:07:33+00:00
> lastview: 2014-03-21T11:07:58+00:00
> views: 9
> site: standaard.be
> globalviews: 1}}
>
> For each url, a new document is created every minute that gathers the 
> count, views and globalviews for that url during that minute.  What I want 
> is for each url the lifetime count, views and globalviews.  I tried using 
>
> {
>   "aggs": {
> "urls": {
>   "terms": {
> "field": "url"
>   },
>   "aggs": {
> "count": {
>   "sum": {
> "field": "count"
>   }
> },
> "views": {
>   "sum": {
> "field": "views"
>   }
> },
> "globalviews": {
>   "sum": {
> "field": "globalviews"
>   }
> }
>   }
> }
>   }
> }
>
> Unfortunately this returns odd results.  I would expect to see each unique 
> url but that's not what happens, I get the following;
>
> aggregations: {
>
>- urls: {
>   - buckets: [
>  - {
> - key: http
> - doc_count: 24503
> - count: {
>- value: 56458
> }
> - globalviews: {
>- value: 608164
> }
> - views: {
>- value: 2952759
> }
>  }
>  - {
> - key: www.standaard.be
> - doc_count: 14018
> - count: {
>- value: 45973
> }
> - globalviews: {
>- value: 320963
> }
> - views: {
>- value: 2679508
> }
>  }
>  - {
> - key: cnt
> - doc_count: 9172
> - count: {
>- value: 41127
> }
> - globalviews: {
>- value: 216736
> }
> - views: {
>- value: 1416645
> }
>  }
>  - {
> - key: utm_campaign
> - doc_count: 8371
> - count: {
>- value: 8371
> }
> - globalviews: {
>- value: 228334
> }
> - views: {
>- value: 172170
> }
>  }
>  - {
> - key: utm_medium
> - doc_count: 8371
> - count: {
>- value: 8371
> }
> - globalviews: {
>- value: 228334
> }
> - views: {
>- value: 172170
> }
>  }
>  - {
> - key: utm_source
> - doc_count: 8371
> - count: {
>- value: 8371
> }
> - globalviews: {
>- value: 228334
> }
> - views: {
>- value: 172170
> }
>  }
>  - {
> - key: standaard
> - doc_count: 8305
> - count: {
>- value: 8305
> }
> - globalviews: {
>- value: 226994
> }
> - views: {
>- value: 172098
> }
>  }
>  - {
> - key: utm_term
> - doc_count: 7190
> - count: {
>- value: 7190
> }
> - globalviews: {
>- value: 197773
> }
> - views: {
>- value: 63153
> }
>  }
>  - {
> - key: article
> - doc_count: 6706
> - count: {
>- value: 6706
> }
> - globalviews: {
>- value: 182001
> }
> - views: {
>- value: 47291
> }
>  }
>  - {
> - key: crosspromoreg
> - doc_count: 6684
> - count: {
>- value: 6684
> }
> - globalviews: {
>- value: 181921
> }
> - views: {
>- value: 47269
> }
>  }
>   ]
>}
>
>

Shingle Filter behaviour in Elasticsearch 1.0

2014-03-21 Thread Jorj Ives

Hello All,

I'm trying to figure out a little problem in ES 1.0.

When using 0.90 I have for a filter that pushes words together by shingling 
with an empty token separator, however, since upgrading to 1.0 it seems to 
switch to a space if the token_separator field is empty (i.e. token_filter: 
"").

I've managed to get around this by pattern replacing spaces with "" after 
it's shingled.

I just wanted to check if this was the expected behaviour now?

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/2f64a4ef-2bab-4bf1-ac09-ccebf3a6018c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: UnitTesting with Elasticsearch and issue with embedded elastic search node

2014-03-21 Thread Ramdev Wudali

The get response is like so :

GetResponse gr = client.prepareGet("testindex", "testDocument", 
> testDocument.getDocumentId()).execute().actionGet();


The searchResponse is like so :

SearchResponse sr = 
>> client.prepareSearch("testindex").setTypes("testDocument")
>
> 
>> .setSearchType(SearchType.QUERY_AND_FETCH).setQuery(QueryBuilders.matchAllQuery())
>
> .setExplain(true).execute().actionGet();
>
>
>
is there a commit like function call after index that is preventing the 
index from being created ? (like a flush  ?)

Thanks

Ramdev
 

On Friday, 21 March 2014 07:26:20 UTC-5, Ramdev Wudali wrote:
>
> Hi All:
>I have been trying to write a unitTest that tests some indexing and 
> search functionality  against an embedded instance of elastic search so 
> that I do not have to rely on a test instance of elastic search being 
> available.  Towards this goal, I create a local node instance like so :
>
> ImmutableSettings.Builder elasticsearchSettings =
>>
>> ImmutableSettings.settingsBuilder()
>>
>>  .put("http.enabled",false)
>>
>>  .put("path.data",dataDirectory);
>>
>>
>>> node = 
>>> nodeBuilder().local(true).settings(elasticsearchSettings.build()).node();
>>
>>
>>
> and I get get a client instance  like so :
>
> Client client = node.client();
>
>
> So far so good. I proceed to index a single document like so :
>
> client.prepareIndex("testindex", "testDocument", 
>>> testDocument.getDocumentId())
>>
>> 
>>>  .setSource(jsonStringnormalized).execute().actionGet();
>>
>>
>> where  jsonStringnormalized is the JSON representation of the 
> testDocument and testDocument.getDocumnetId() returns the ID for the 
> document.
>
> I then perform two requests, 1 a Get request (using GetReponse) and the 
> second a SearchResponse.
> The GetReponse returns the document I just indexed. However my search 
> request (where is a query :QueryBuilders.matchAllQuery() ) returns no 
> documents.
>
> My Question is this, What is the difference between the GetReponse request 
> and a SearchResposne request. 
> Is there a way for the SearchResponse to succeed in this environment 
>
> Thanks much
>
> Ramdev
>
>
>  
>  
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/bff8712e-482c-4d0b-a51a-81ffc43597e8%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Using ES for web analytics

2014-03-21 Thread Jan van Vlimmeren

Hi everyone,

I'm brand new to ES and am trying to use it to create a basic analytics app.

I'm running into a problem that I can't seem to get sorted out myself:

I have documents like this:
{
_index: hpstats
_type: articles
_id: http://www.standaard.be/cnt/dmf20140321_01034888-2014-03-21-11-07
_version: 10
_score: 1
_source: {
url: http://www.standaard.be/cnt/dmf20140321_01034888
count: 2
title: Busverkeer Vlaams-Brabant verstoord door staking in Asse
created: 2014-03-21T11:07:33+00:00
lastview: 2014-03-21T11:07:58+00:00
views: 9
site: standaard.be
globalviews: 1}}

For each url, a new document is created every minute that gathers the 
count, views and globalviews for that url during that minute.  What I want 
is for each url the lifetime count, views and globalviews.  I tried using 

{
  "aggs": {
"urls": {
  "terms": {
"field": "url"
  },
  "aggs": {
"count": {
  "sum": {
"field": "count"
  }
},
"views": {
  "sum": {
"field": "views"
  }
},
"globalviews": {
  "sum": {
"field": "globalviews"
  }
}
  }
}
  }
}

Unfortunately this returns odd results.  I would expect to see each unique 
url but that's not what happens, I get the following;

aggregations: {
   
   - urls: {
  - buckets: [
 - {
- key: http
- doc_count: 24503
- count: {
   - value: 56458
}
- globalviews: {
   - value: 608164
}
- views: {
   - value: 2952759
}
 }
 - {
- key: www.standaard.be
- doc_count: 14018
- count: {
   - value: 45973
}
- globalviews: {
   - value: 320963
}
- views: {
   - value: 2679508
}
 }
 - {
- key: cnt
- doc_count: 9172
- count: {
   - value: 41127
}
- globalviews: {
   - value: 216736
}
- views: {
   - value: 1416645
}
 }
 - {
- key: utm_campaign
- doc_count: 8371
- count: {
   - value: 8371
}
- globalviews: {
   - value: 228334
}
- views: {
   - value: 172170
}
 }
 - {
- key: utm_medium
- doc_count: 8371
- count: {
   - value: 8371
}
- globalviews: {
   - value: 228334
}
- views: {
   - value: 172170
}
 }
 - {
- key: utm_source
- doc_count: 8371
- count: {
   - value: 8371
}
- globalviews: {
   - value: 228334
}
- views: {
   - value: 172170
}
 }
 - {
- key: standaard
- doc_count: 8305
- count: {
   - value: 8305
}
- globalviews: {
   - value: 226994
}
- views: {
   - value: 172098
}
 }
 - {
- key: utm_term
- doc_count: 7190
- count: {
   - value: 7190
}
- globalviews: {
   - value: 197773
}
- views: {
   - value: 63153
}
 }
 - {
- key: article
- doc_count: 6706
- count: {
   - value: 6706
}
- globalviews: {
   - value: 182001
}
- views: {
   - value: 47291
}
 }
 - {
- key: crosspromoreg
- doc_count: 6684
- count: {
   - value: 6684
}
- globalviews: {
   - value: 181921
}
- views: {
   - value: 47269
}
 }
  ]
   }


Anyone have an idea how I can get the results I would expect?



-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/6f56b437-0583-442e-af05-a4b29cfd%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Logstash logs

2014-03-21 Thread Arik Gaisler

What is the default log level in logstash? how do I run with log level set 
to error (meaning only errors shall be logged)

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/ed8186e0-c413-44ff-b775-8868c790f70b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

UnitTesting with Elasticsearch and issue with embedded elastic search node

2014-03-21 Thread Ramdev Wudali

Hi All:
   I have been trying to write a unitTest that tests some indexing and 
search functionality  against an embedded instance of elastic search so 
that I do not have to rely on a test instance of elastic search being 
available.  Towards this goal, I create a local node instance like so :

ImmutableSettings.Builder elasticsearchSettings =
>
> ImmutableSettings.settingsBuilder()
>
>  .put("http.enabled",false)
>
>  .put("path.data",dataDirectory);
>
>
>> node = 
>> nodeBuilder().local(true).settings(elasticsearchSettings.build()).node();
>
>
>
and I get get a client instance  like so :

Client client = node.client();


So far so good. I proceed to index a single document like so :

client.prepareIndex("testindex", "testDocument", 
>> testDocument.getDocumentId())
>
> 
>>  .setSource(jsonStringnormalized).execute().actionGet();
>
>
> where  jsonStringnormalized is the JSON representation of the testDocument 
and testDocument.getDocumnetId() returns the ID for the document.

I then perform two requests, 1 a Get request (using GetReponse) and the 
second a SearchResponse.
The GetReponse returns the document I just indexed. However my search 
request (where is a query :QueryBuilders.matchAllQuery() ) returns no 
documents.

My Question is this, What is the difference between the GetReponse request 
and a SearchResposne request. 
Is there a way for the SearchResponse to succeed in this environment 

Thanks much

Ramdev


 
 

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/695422a4-0f8f-4877-9bfd-feb56292ab90%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Elasticsearch configuration for uninterrupted indexing

2014-03-21 Thread Rujuta Deshpande

Hi,

I am setting up a system consisting of elasticsearch-logstash-kibana for
log analysis. I am using one machine (2 GB RAM, 2 CPUs) running logstash,
kibana and two instances of elasticsearch. Two other machines, each
running logstash-forwarder are pumping logs into the ELK system.

The reasoning behind using two ES instances was this - I needed one
uninterrupted instance to index the incoming logs and I also needed to
query the currently existing indices. However, I didn't want any complex
querying to result in loss of events owing to Out of Memory Errors because
of excessive querying.

So, one elasticsearch node was master = true and data = true which did the
indexing (called the writer node) and the other node, was master = false
and data = false (this was the workhorse or reader node) .

I assumed that, in cases of excessive querying, although the data is stored
on the writer node, the reader node will query the data and all the
processing will take place on the reader as a result of which issues like
out of memory error etc will be avoided and uninterrupted indexing will
take place.

However, while testing this, I realized that the reader hardly uses the
heap memory ( Checked this in Marvel ) and when I fire a complex search
query - which was a search request using the python API where the 'size'
parameter was set to 1, the writer node throws an out of memory error,
indicating that the processing also takes place on the writer node only. My
min and max heap size was set to 256m for this test. I also ensured that I
was firing the search query to the port on which the reader node was
listening (Port 9200). The writer node was running on Port 9201.

Was my previous understanding of the problem incorrect - i.e. having one
reader and one writer node, doesn't help in uninterrupted indexing of
documents? If this is so, what is the use of having a separate workhorse or
reader node?

My eventual aim is to be able to query elasticsearch and fetch large
amounts of data at a time without interrupting/slowing down the indexing of
documents.

Thank you.

Rujuta

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/a8fcd5f0-447a-4654-9115-9bc4e524b246%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Java issue when trying to send requests to ElasticSearch

2014-03-21 Thread charles

Oh, ok thanks...

I had to update ES because the version of Kibana I am using wasn't 
supporting the previous one. I guess I'll have to downgrade everything or 
wait.

Thanks a lot!

On Friday, March 21, 2014 6:01:51 PM UTC+7, Kevin Wang wrote:
>
> It looks like you are using "elasticsearch-http-basic" plugin and that 
> plugin doesn't support ES 1.0
> https://github.com/Asquera/elasticsearch-http-basic/issues/9
>
>
> On Friday, March 21, 2014 9:50:02 PM UTC+11, cha...@pocketplaylab.comwrote:
>>
>> Hi all,
>>
>> I am currently trying to set up a complete ElasticSearch + LogStash + 
>> Kibana stack on Amazon Web Services OpsWorks using the following tutorial : 
>> http://devblog.springest.com/complete-logstash-stack-on-aws-opsworks-in-15-minutes/
>>
>> Most of the things run fine except for ElasticSearch. When the process is 
>> started, if I try to do a simple *c**url -X GET http://localhost:9200/ 
>> *, I get the following answer : *curl: (52) 
>> Empty reply from server*
>>
>> In my cluster's log, I see the hereunder java error. Did anybody 
>> experience that ? Any suggestions ?
>>
>> Thanks for your help,
>>
>> Charles.
>>
>> Java error :
>>
>> *[2014-03-21 10:46:48,657][WARN ][http.netty   ] [Cecilia 
>> Reyes] Caught exception while handling client http traffic, closing 
>> connection [id: 0xf290eec5, /127.0.0.1:60355  => 
>> /127.0.0.1:9200 ]*
>>
>> *java.lang.IncompatibleClassChangeError: Found class 
>> org.elasticsearch.http.HttpRequest, but interface was expected*
>>
>> * at 
>> com.asquera.elasticsearch.plugins.http.HttpBasicServer.shouldLetPass(HttpBasicServer.java:43)*
>>
>> * at 
>> com.asquera.elasticsearch.plugins.http.HttpBasicServer.internalDispatchRequest(HttpBasicServer.java:35)*
>>
>> * at 
>> org.elasticsearch.http.HttpServer$Dispatcher.dispatchRequest(HttpServer.java:83)*
>>
>> * at 
>> org.elasticsearch.http.netty.NettyHttpServerTransport.dispatchRequest(NettyHttpServerTransport.java:291)*
>>
>> * at 
>> org.elasticsearch.http.netty.HttpRequestHandler.messageReceived(HttpRequestHandler.java:43)*
>>
>> * at 
>> org.elasticsearch.common.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)*
>>
>> * at 
>> org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)*
>>
>> * at 
>> org.elasticsearch.common.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)*
>>
>> * at 
>> org.elasticsearch.common.netty.handler.codec.http.HttpChunkAggregator.messageReceived(HttpChunkAggregator.java:145)*
>>
>> * at 
>> org.elasticsearch.common.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)*
>>
>> * at 
>> org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)*
>>
>> * at 
>> org.elasticsearch.common.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)*
>>
>> * at 
>> org.elasticsearch.common.netty.channel.Channels.fireMessageReceived(Channels.java:296)*
>>
>> * at 
>> org.elasticsearch.common.netty.handler.codec.frame.FrameDecoder.unfoldAndFireMessageReceived(FrameDecoder.java:459)*
>>
>> * at 
>> org.elasticsearch.common.netty.handler.codec.replay.ReplayingDecoder.callDecode(ReplayingDecoder.java:536)*
>>
>> * at 
>> org.elasticsearch.common.netty.handler.codec.replay.ReplayingDecoder.messageReceived(ReplayingDecoder.java:435)*
>>
>> * at 
>> org.elasticsearch.common.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)*
>>
>> * at 
>> org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)*
>>
>> * at 
>> org.elasticsearch.common.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)*
>>
>> * at 
>> org.elasticsearch.common.netty.OpenChannelsHandler.handleUpstream(OpenChannelsHandler.java:74)*
>>
>> * at 
>> org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)*
>>
>> * at 
>> org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:559)*
>>
>> * at 
>> org.elasticsearch.common.netty.channel.Channels.fireMessageReceived(Channels.java:268)*
>>
>> * at 
>> org.elasticsearch.common.netty.channel.Channels.fireMessageReceived(Channels.java:255)*
>>
>> * at 
>> org.elasticsearch.common.netty.channel.socket.nio.NioWorker.read(NioWorker.java:88)*
>>
>> * at 
>> org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:108)*
>>
>> * at 
>> org.elasticsearch.common.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:318)*
>>
>> * at 
>> org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.run(Abstract

Re: Java issue when trying to send requests to ElasticSearch

2014-03-21 Thread Kevin Wang

It looks like you are using "elasticsearch-http-basic" plugin and that 
plugin doesn't support ES 1.0
https://github.com/Asquera/elasticsearch-http-basic/issues/9


On Friday, March 21, 2014 9:50:02 PM UTC+11, cha...@pocketplaylab.com wrote:
>
> Hi all,
>
> I am currently trying to set up a complete ElasticSearch + LogStash + 
> Kibana stack on Amazon Web Services OpsWorks using the following tutorial : 
> http://devblog.springest.com/complete-logstash-stack-on-aws-opsworks-in-15-minutes/
>
> Most of the things run fine except for ElasticSearch. When the process is 
> started, if I try to do a simple *c**url -X GET http://localhost:9200/ 
> *, I get the following answer : *curl: (52) Empty 
> reply from server*
>
> In my cluster's log, I see the hereunder java error. Did anybody 
> experience that ? Any suggestions ?
>
> Thanks for your help,
>
> Charles.
>
> Java error :
>
> *[2014-03-21 10:46:48,657][WARN ][http.netty   ] [Cecilia 
> Reyes] Caught exception while handling client http traffic, closing 
> connection [id: 0xf290eec5, /127.0.0.1:60355  => 
> /127.0.0.1:9200 ]*
>
> *java.lang.IncompatibleClassChangeError: Found class 
> org.elasticsearch.http.HttpRequest, but interface was expected*
>
> * at 
> com.asquera.elasticsearch.plugins.http.HttpBasicServer.shouldLetPass(HttpBasicServer.java:43)*
>
> * at 
> com.asquera.elasticsearch.plugins.http.HttpBasicServer.internalDispatchRequest(HttpBasicServer.java:35)*
>
> * at 
> org.elasticsearch.http.HttpServer$Dispatcher.dispatchRequest(HttpServer.java:83)*
>
> * at 
> org.elasticsearch.http.netty.NettyHttpServerTransport.dispatchRequest(NettyHttpServerTransport.java:291)*
>
> * at 
> org.elasticsearch.http.netty.HttpRequestHandler.messageReceived(HttpRequestHandler.java:43)*
>
> * at 
> org.elasticsearch.common.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)*
>
> * at 
> org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)*
>
> * at 
> org.elasticsearch.common.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)*
>
> * at 
> org.elasticsearch.common.netty.handler.codec.http.HttpChunkAggregator.messageReceived(HttpChunkAggregator.java:145)*
>
> * at 
> org.elasticsearch.common.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)*
>
> * at 
> org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)*
>
> * at 
> org.elasticsearch.common.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)*
>
> * at 
> org.elasticsearch.common.netty.channel.Channels.fireMessageReceived(Channels.java:296)*
>
> * at 
> org.elasticsearch.common.netty.handler.codec.frame.FrameDecoder.unfoldAndFireMessageReceived(FrameDecoder.java:459)*
>
> * at 
> org.elasticsearch.common.netty.handler.codec.replay.ReplayingDecoder.callDecode(ReplayingDecoder.java:536)*
>
> * at 
> org.elasticsearch.common.netty.handler.codec.replay.ReplayingDecoder.messageReceived(ReplayingDecoder.java:435)*
>
> * at 
> org.elasticsearch.common.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)*
>
> * at 
> org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)*
>
> * at 
> org.elasticsearch.common.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)*
>
> * at 
> org.elasticsearch.common.netty.OpenChannelsHandler.handleUpstream(OpenChannelsHandler.java:74)*
>
> * at 
> org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)*
>
> * at 
> org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:559)*
>
> * at 
> org.elasticsearch.common.netty.channel.Channels.fireMessageReceived(Channels.java:268)*
>
> * at 
> org.elasticsearch.common.netty.channel.Channels.fireMessageReceived(Channels.java:255)*
>
> * at 
> org.elasticsearch.common.netty.channel.socket.nio.NioWorker.read(NioWorker.java:88)*
>
> * at 
> org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:108)*
>
> * at 
> org.elasticsearch.common.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:318)*
>
> * at 
> org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89)*
>
> * at 
> org.elasticsearch.common.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)*
>
> * at 
> org.elasticsearch.common.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)*
>
> * at 
> org.elasticsearch.common.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)*
>
> * at 
> java.util.concurr

Java issue when trying to send requests to ElasticSearch

2014-03-21 Thread charles

Hi all,

I am currently trying to set up a complete ElasticSearch + LogStash + 
Kibana stack on Amazon Web Services OpsWorks using the following tutorial 
: 
http://devblog.springest.com/complete-logstash-stack-on-aws-opsworks-in-15-minutes/

Most of the things run fine except for ElasticSearch. When the process is 
started, if I try to do a simple *c**url -X GET http://localhost:9200/*, I 
get the following answer : *curl: (52) Empty reply from server*

In my cluster's log, I see the hereunder java error. Did anybody experience 
that ? Any suggestions ?

Thanks for your help,

Charles.

Java error :

*[2014-03-21 10:46:48,657][WARN ][http.netty   ] [Cecilia 
Reyes] Caught exception while handling client http traffic, closing 
connection [id: 0xf290eec5, /127.0.0.1:60355 => /127.0.0.1:9200]*

*java.lang.IncompatibleClassChangeError: Found class 
org.elasticsearch.http.HttpRequest, but interface was expected*

* at 
com.asquera.elasticsearch.plugins.http.HttpBasicServer.shouldLetPass(HttpBasicServer.java:43)*

* at 
com.asquera.elasticsearch.plugins.http.HttpBasicServer.internalDispatchRequest(HttpBasicServer.java:35)*

* at 
org.elasticsearch.http.HttpServer$Dispatcher.dispatchRequest(HttpServer.java:83)*

* at 
org.elasticsearch.http.netty.NettyHttpServerTransport.dispatchRequest(NettyHttpServerTransport.java:291)*

* at 
org.elasticsearch.http.netty.HttpRequestHandler.messageReceived(HttpRequestHandler.java:43)*

* at 
org.elasticsearch.common.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)*

* at 
org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)*

* at 
org.elasticsearch.common.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)*

* at 
org.elasticsearch.common.netty.handler.codec.http.HttpChunkAggregator.messageReceived(HttpChunkAggregator.java:145)*

* at 
org.elasticsearch.common.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)*

* at 
org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)*

* at 
org.elasticsearch.common.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)*

* at 
org.elasticsearch.common.netty.channel.Channels.fireMessageReceived(Channels.java:296)*

* at 
org.elasticsearch.common.netty.handler.codec.frame.FrameDecoder.unfoldAndFireMessageReceived(FrameDecoder.java:459)*

* at 
org.elasticsearch.common.netty.handler.codec.replay.ReplayingDecoder.callDecode(ReplayingDecoder.java:536)*

* at 
org.elasticsearch.common.netty.handler.codec.replay.ReplayingDecoder.messageReceived(ReplayingDecoder.java:435)*

* at 
org.elasticsearch.common.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)*

* at 
org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)*

* at 
org.elasticsearch.common.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)*

* at 
org.elasticsearch.common.netty.OpenChannelsHandler.handleUpstream(OpenChannelsHandler.java:74)*

* at 
org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)*

* at 
org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:559)*

* at 
org.elasticsearch.common.netty.channel.Channels.fireMessageReceived(Channels.java:268)*

* at 
org.elasticsearch.common.netty.channel.Channels.fireMessageReceived(Channels.java:255)*

* at 
org.elasticsearch.common.netty.channel.socket.nio.NioWorker.read(NioWorker.java:88)*

* at 
org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:108)*

* at 
org.elasticsearch.common.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:318)*

* at 
org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89)*

* at 
org.elasticsearch.common.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)*

* at 
org.elasticsearch.common.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)*

* at 
org.elasticsearch.common.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)*

* at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146)*

* at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)*

*at java.lang.Thread.run(Thread.java:701)*

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/509cdab6

Re: User defined dictionary in lingo3g for Elasticsearch wrt label/word/synonym

Hi Dawid,

>>so if you specify your own resources these will have a priority.
Is it like the custom resources(if specified) will have priority or it will
override the default one.

How clustering will happen in below scenario:
1) Default resources are enabled , Custom resources(having empty tags i.e.
... for all ) present in config dir.
So clustering will happen only in terms of custom resource or both.

2) Default resources are disabled by setting
*use-built-in-word-database-for-label-filtering* attribute, Custom
resources(having empty tags i.e. ...
for all ) present in config dir.

3) Default resources are disabled by setting
*use-built-in-word-database-for-label-filtering* attribute, no custom
resources are present in config.

So I am some what more interested to know that how the clustering will
happen in case 2 and 3 as defaults are disabled and custom are also not
present.

--
View this message in context:
http://elasticsearch-users.115913.n3.nabble.com/User-defined-dictionary-in-lingo3g-for-Elasticsearch-wrt-label-word-synonym-tp4052442p4052462.html
Sent from the ElasticSearch Users mailing list archive at Nabble.com.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/1395398869919-4052462.post%40n3.nabble.com.
For more options, visit https://groups.google.com/d/optout.

Re: BigDecimal support

2014-03-21 Thread mooky

Trying to submit a pull request. Getting a 403  
-Nick

On Monday, 10 March 2014 17:28:24 UTC, mooky wrote:
>
> Righto - I will try add some.
> -Nick
>
> On Wednesday, 5 March 2014 13:48:58 UTC, Jörg Prante wrote:
>>
>> Yes, there are no tests yet.
>>
>> Jörg
>>
>>
>> On Wed, Mar 5, 2014 at 2:24 PM, mooky  wrote:
>>
>>> I am ready to create a pull request - its actually quite a simple change.
>>> However, I cant find *any *tests for the existing BigDecimal support 
>>> ... does that sound right?
>>>
>>> -Nick
>>>
>>>
>>>
>>> On Friday, 28 February 2014 12:09:00 UTC, mooky wrote:

 XContentBuilder has support for BigDecimal, but:

1. If you pass the source as a Map when indexing, the BigDecimal 
handling doesn't get invoked (https://github.com/

 elasticsearch/elasticsearch/issues/5260
).
2. The existing handling should delegate through to Jackson's 
handling of BigDecimal (which can be configured to serialise BigDecimal 
 in 
a lossless fashion - I dont think that feature existed when I had to 
 worry 
about it last) 

 Looking at the code now, I think its actually an easy change - I will 
 see if I can create a pull request.

 -Nick


 On Wednesday, 26 February 2014 17:28:29 UTC, Jörg Prante wrote:
>
> ES accepts BigDecimal input. You can specify scale and rounding mode 
> to format the BigDecimal. 
>
> https://github.com/jprante/elasticsearch/commit/
> 8ef8cd149b867e3e45bc3055dfd6da80e4e9c7ec
>
> Internally, BigDecimal is automatically converted to a JSON string if 
> the number does not fit into double format. Because numbers are useful in 
> Lucene for range searches, they have an advantage.
>
> But I agree, another option could be to enforce string conversion in 
> any case, for example storing currency values as strings for financial 
> services, without arithmetic operations in the index.
>
> Maybe the toEngineeringString() was not a smart decision and 
> toPlainString() works better.
>
> So I would welcome improvements, or should I suggest one in a pull 
> request?
>
> Jörg
>
>
>
> On Wed, Feb 26, 2014 at 6:05 PM, mooky  wrote:
>
>> In financial services space, we almost never use float/double in our 
>> domain - we always use BigDecimal.
>>
>> In elastic, I would like to be able to index/store BigDecimal in a 
>> lossless manner (ie what I get back from _source has the same precision, 
>> etc as what I put in).
>>
>> When I have had to preserve the json serialisation of BigDecimal, I 
>> have usually had custom serialiser/deserialisers that printed it out as 
>> a 
>> json number - but whose textual value was toPlainString(). When 
>> deserialising, creating the BigDecimal with the string value (e.g. 
>> '42.5400') maintained the precision that was originally serialised
>> e.g.
>>
>> {
>>   verySmallNumber : 0.012000,
>>   otherNumber : 42.5400
>> }
>>
>> Perhaps elastic could index bigdecimal as a double - but store it in 
>> the source in a lossless fashion.
>> It would require a user setting, I guess, to treat all floating point 
>> numbers as BigDecimal.
>>
>> Thoughts?
>>
>> -- 
>> You received this message because you are subscribed to the Google 
>> Groups "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, 
>> send an email to elasticsearc...@googlegroups.com.
>> To view this discussion on the web visit https://groups.google.com/d/
>> msgid/elasticsearch/b54dfd5a-3a0e-4946-aa5f-28b3794a92ac%
>> 40googlegroups.com.
>> For more options, visit https://groups.google.com/groups/opt_out.
>>
>
>  -- 
>>> You received this message because you are subscribed to the Google 
>>> Groups "elasticsearch" group.
>>> To unsubscribe from this group and stop receiving emails from it, send 
>>> an email to elasticsearc...@googlegroups.com.
>>> To view this discussion on the web visit 
>>> https://groups.google.com/d/msgid/elasticsearch/b8463a21-c997-4269-ae52-992caae88ced%40googlegroups.com
>>> .
>>>
>>> For more options, visit https://groups.google.com/groups/opt_out.
>>>
>>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/f567a7fe-7c41-438d-95a5-3c7415f6b32b%40googlegroups.com.
For more options, visit https://gro

Re: elasticsearch java interaction

2014-03-21 Thread Venu Krishna

you can find my elasticsearch window i.e overview window and browse window. 
my java code you can see below.

from these windows i think you can get what i am doing.
i just create new indexes with type and their id simply .
before this i din't created any cluster or any node e.t.c. ,from my 
knowledge i have not created,but from ES site  i came to know that cluster 
will be created bydefault and node aswell,but i have not given any cluster 
name or any specific thing or nodes.

i think i am not making any complication about this.

but to be frank..i am moving here and there like a child to know what is 
happening.

  Java Code

import java.net.InetSocketAddress;
import java.util.HashSet;
import java.util.Set;

import org.elasticsearch.client.Client;
import org.elasticsearch.client.transport.TransportClient;
import org.elasticsearch.cluster.node.DiscoveryNode;
import org.elasticsearch.common.settings.ImmutableSettings;
import org.elasticsearch.common.settings.Settings;
import org.elasticsearch.common.transport.InetSocketTransportAddress;


public class JavaES_Client {

Client client;
Set hosts;

void function()
{
//on StartUp
System.out.println("In Function");

client = new TransportClient()
.addTransportAddress(new InetSocketTransportAddress("localhost", 
9300));


System.out.println("Connected");
//on ShutDown
client.close();
}

 void connectES() {
hosts = new HashSet();
  hosts.add("locolhost");
//  hosts.add("host2.host1.mydomain.com"); // Make sure this 
resolvs to proper IP address
  Settings settings = 
ImmutableSettings.settingsBuilder().put("cluster.name", 
"vesseltrackerES").build();

  TransportClient transportClient = new TransportClient(settings);
  for (String host : this.hosts) {
transportClient = transportClient.addTransportAddress(new 
InetSocketTransportAddress(host, 9200));
  }

  System.out.print("Connected to nodes : ");
  for (DiscoveryNode node : transportClient.connectedNodes()) {
System.out.print(node.getName() + " , ");
  }
  System.out.println("");

  this.client = (Client) transportClient;
}


public static void main(String[] args) {

System.out.println("In Main Method");
JavaES_Client jc = new JavaES_Client();
System.out.println("Object Created");
//jc.function();
jc.connectES();

}

}



On Thursday, March 20, 2014 8:41:08 PM UTC+5:30, Georgi Ivanov wrote:
>
> There is something wrong with your set-up
>
> How many ES node s you have ?
> On which IP addresses are ES hosts listening ?
>
> I understood you have 2 hosts , but it seems you have only one on your 
> local machine .
>
> This is the code (a bit modified) I am using at the moment 
>
> 
> public void connectES() {
> Set hosts = new HashSet();
>   hosts.add("host1.mydomain.com");
>   hosts.add("host2.host1.mydomain.com"); // Make sure this resolvs to 
> proper IP address
>   Settings settings = ImmutableSettings.settingsBuilder().put("
> cluster.name", "vesseltrackerES").build();
>
>   TransportClient transportClient = new TransportClient(settings);
>   for (String host : this.hosts) {
> transportClient = transportClient.addTransportAddress(new 
> InetSocketTransportAddress(host, 9300));
>   }
>
>   System.out.print("Connected to nodes : ");
>   for (DiscoveryNode node : transportClient.connectedNodes()) {
> System.out.print(node.getHostName() + " , ");
>   }
>   System.out.println("");
>
>   this.client = (Client) transportClient;
> }
>
>
> On Thursday, March 20, 2014 2:51:50 PM UTC+1, Venu Krishna wrote:
>>
>> Actually this is my elasticsearch index  http://localhost:9200/, as you 
>> told i have replaced 9200 with 9300 in the above code ,then i executed the 
>> application i am getting following exceptions.
>>
>> Mar 20, 2014 7:17:45 PM org.elasticsearch.client.transport
>> WARNING: [Bailey, Gailyn] failed to get node info for 
>> [#transport#-1][inet[localhost/127.0.0.1:9300]]
>> org.elasticsearch.transport.NodeDisconnectedException: 
>> [][inet[localhost/127.0.0.1:9300]][/cluster/nodes/info] disconnected
>>
>> Connected
>> Mar 20, 2014 7:17:50 PM org.elasticsearch.client.transport
>> WARNING: [Bailey, Gailyn] failed to get node info for 
>> [#transport#-1][inet[localhost/127.0.0.1:9300]]
>> org.elasticsearch.transport.NodeDisconnectedException: 
>> [][inet[localhost/127.0.0.1:9300]][/cluster/nodes/info] disconnected
>>
>> Mar 20, 2014 7:17:50 PM org.elasticsearch.client.transport
>> WARNING: [Bailey, Gailyn] failed to get node info for 
>> [#transport#-1][inet[localhost/127.0.0.1:9300]]
>> org.elasticsearch.transport.NodeDisconnectedException: 
>> [][inet[localhost/127.0.0.1:9300]][/cluster/nodes/info] disconnected
>>
>> Thankyou
>>
>> On Thursday, March 20, 2014 7:12:14 PM UTC+5:30, David Pilato

Re: Elasticsearch 1.0.1 on AWS

Hi David,

I tried unicast with making my removing the node.master=false. Now any of
my node can become master.

I this case unicast is working but I have to manually do the unicat setting
in both nodes.

I have also installed cloud-aws plugin on both nodes, still no logs are
getting generated for ec2 discovery.

Regards

Geet


On Fri, Mar 21, 2014 at 2:46 PM, Geet Gangwar  wrote:

> I have set private ip of my data node.
>
>  discovery.zen.ping.unicast.hosts: ["10.142.181.16"]
> and have disabled the multicast
>
> discovery.zen.ping.multicast.enabled: false
>
>
> Regards
>
> Geet
>
>
> On Fri, Mar 21, 2014 at 2:17 PM, David Pilato  wrote:
>
>> status is incorrect but I guess it's due to the fact your data node is
>> not a master node and can't find a master.
>>
>> How did you set unicast?
>>
>>  --
>> *David Pilato* | *Technical Advocate* | *Elasticsearch.com*
>> @dadoonet  | 
>> @elasticsearchfr
>>
>>
>> Le 21 mars 2014 à 09:33:35, Geet Gangwar (geetgang...@gmail.com) a écrit:
>>
>>   Thanks David for such a quick reply.
>>
>> I tried you command from one node to second node and got following result.
>>
>> {
>>   "status" : 503,
>>   "name" : "cleandata-DataNode-1",
>>   "version" : {
>> "number" : "1.0.1",
>> "build_hash" : "5c03844e1978e5cc924dab2a423dc63ce881c42b",
>> "build_timestamp" : "2014-02-25T15:52:53Z",
>> "build_snapshot" : false,
>> "lucene_version" : "4.6"
>>   },
>>   "tagline" : "You Know, for Search"
>> }
>>
>> Dont know whehter it is correct or something is wrong.
>>
>> Regards
>>
>> Geet
>>
>>
>> On Fri, Mar 21, 2014 at 12:43 PM, David Pilato  wrote:
>>
>>>  They are sharing same security group name and are deployed in same
>>> region?
>>> If unicast does not work using private IP, aws plugin won't work either.
>>>
>>> Can you from one node run
>>>
>>> curl http://secondnodeip:9200/
>>>
>>> Same from second node.
>>>
>>>
>>> --
>>> David ;-)
>>> Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs
>>>
>>>
>>> Le 21 mars 2014 à 08:08, Geet Gangwar  a écrit :
>>>
>>>   Hi
>>>
>>> I am also facing same issue of no dicovery.ec2 logs getting generated
>>> for elasticsearch 1.0.1 and cloud-aws plugin 2.0.0.RC
>>>
>>> I tried the corrected version of setting given:
>>>
>>> cloud:
>>>aws:
>>>  access_key: XX
>>>  secret_key: XX
>>>  region: ap-southeast-1
>>> discovery:
>>>  type: ec2
>>>  groups: 
>>>
>>> Difference is the region.
>>>
>>> I have created two nodes(one is master node with node.data = false and
>>> other one is just a data node with node.master = false)
>>>
>>> Even when I try for unicast and specify hard coded private ip then also
>>> the second node does not joins the cluster.
>>>
>>> I installed head plugin on the master node which is working fine and I m
>>> able to view one node in the browser.
>>>
>>>
>>> Please guide me on what I am doing wrong.
>>>
>>>
>>>
>>>
>>>
>>> On Thursday, March 20, 2014 3:51:36 AM UTC+5:30, Daniel F. wrote:

 Thanks David,

 You are right elasticsearch.yml had two errors: space before
 "discovery" and "region" should not be equal to availability zone "
 eu-west-1a" but "eu-west-1".

 The working version is:

 discovery.zen.ping.multicast.enabled: false

 cloud:
   aws:
 access_key: 
 secret_key: 
 region: eu-west-1
 discovery:
 type: ec2
 groups: my-security-group

 Daniel

 On Wednesday, March 19, 2014 10:21:36 PM UTC+2, David Pilato wrote:
>
>  I think your elasticsearch.yml is incorrect.
>
>  Just saying because as far as I can see, ec2 discovery does not start
> actually.
>  So, may be some space before cloud.type: ec2 is missing?
>
>
>  --
> *David Pilato* | *Technical Advocate* | *Elasticsearch.com
> *
> @dadoonet  | 
> @elasticsearchfr
>
>
> Le 19 mars 2014 à 19:31:03, Daniel F. (feins...@gmail.com) a écrit:
>
>  Yes, i am running elasticsearch as root. It is a test environment so
> I did not tune it yet.
> --
> You received this message because you are subscribed to the Google
> Groups "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to elasticsearc...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/5f49602f-9db0-4e68-9037-310a87064f1e%40googlegroups.com
> .
> For more options, visit https://groups.google.com/d/optout.
>
>--
>>> You received this message because you are subscribed to the Google
>>> Groups "elasticsearch" group.

Re: User defined dictionary in lingo3g for Elasticsearch wrt label/word/synonym

2014-03-21 Thread Dawid Weiss

> As I have not copied the defaults dictionary from java API bundle to My
> ES/Config still clustering of documents happens so on what basis that
> clustering happens? Is it like that the default dictionary is also bundled 
> with the lingo3g jar
> file so after that if we places the custom dictionary file then it will
> override the default dictionary bundled with jar files if any?

The default lexical resources are also included in the lingo3g.jar,
correct. These are fallback defaults, so if you specify your own
resources these will have a priority.

Dawid

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAM21Rt_sawqV6UZGJbPuf5CD1rOHau2-NWSK9Vn7wecG1YBN0A%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: User defined dictionary in lingo3g for Elasticsearch wrt label/word/synonym

>From the 4th point I mean to say that where exactly I can have a look for
default word-dictionary in ES (as per pre setup I installed ES + carrot2 +
copied Java API for lingo3g) though I have not copied the word dictionary
manually to my config.

As I have not copied the defaults dictionary from java API bundle to My
ES/Config still clustering of documents happens so on what basis that
clustering happens?
Is it like that the default dictionary is also bundled with the lingo3g jar
file so after that if we places the custom dictionary file then it will
override the default dictionary bundled with jar files if any?



--
View this message in context: 
http://elasticsearch-users.115913.n3.nabble.com/User-defined-dictionary-in-lingo3g-for-Elasticsearch-wrt-label-word-synonym-tp4052442p4052455.html
Sent from the ElasticSearch Users mailing list archive at Nabble.com.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/1395394224848-4052455.post%40n3.nabble.com.
For more options, visit https://groups.google.com/d/optout.

Re: User defined dictionary in lingo3g for Elasticsearch wrt label/word/synonym

2014-03-21 Thread Dawid Weiss

I'm not sure I understand your 4th question... The Lingo3G manual
(pointed to by Jörg) has an explicit location where lexical resources
should be placed:

> If you have any custom lexical resources then the override folder 
> is${es.home}/config/ by default.
> So, for example, placing word-dictionary.en.xml there will override the 
> default English word dictionary.

All the default lexical resources come with Lingo3G bundles (for
example the Lingo3G Java API bundle) and you can copy the defaults
over from there.

Dawid


On Fri, Mar 21, 2014 at 9:53 AM, Prashant Agrawal
 wrote:
> Hi Jörg,
>
> Wile exploring more I got the answer for first 3 points just wanted to get
> clarification for point 4:
>
> <<4) How I can check the built-in word databases wrt ES for clustering, is
> word-dictionary.en.xml is the built-< can find in ES after configuring ES with carrot2 and Lingo3g?
> << Source:  href="http://download.carrotsearch.com/lingo3g/manual/#section.attribute.use-built-in-word-database-for-label-filtering";>http://download.carrotsearch.com/lingo3g/manual/#section.attribute.use-built-in-word-database-for-label-filtering
>
>
>
> --
> View this message in context: 
> http://elasticsearch-users.115913.n3.nabble.com/User-defined-dictionary-in-lingo3g-for-Elasticsearch-wrt-label-word-synonym-tp4052442p4052451.html
> Sent from the ElasticSearch Users mailing list archive at Nabble.com.
>
> --
> You received this message because you are subscribed to the Google Groups 
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/elasticsearch/1395391994271-4052451.post%40n3.nabble.com.
> For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAM21Rt9dqVNXzXauNGg6n7QD4EEuA4c4OcYBDH%2BKrf%3DoQe4T7g%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: Elasticsearch 1.0.1 on AWS

I have set private ip of my data node.

 discovery.zen.ping.unicast.hosts: ["10.142.181.16"]
and have disabled the multicast

discovery.zen.ping.multicast.enabled: false


Regards

Geet


On Fri, Mar 21, 2014 at 2:17 PM, David Pilato  wrote:

> status is incorrect but I guess it's due to the fact your data node is not
> a master node and can't find a master.
>
> How did you set unicast?
>
>  --
> *David Pilato* | *Technical Advocate* | *Elasticsearch.com*
> @dadoonet  | 
> @elasticsearchfr
>
>
> Le 21 mars 2014 à 09:33:35, Geet Gangwar (geetgang...@gmail.com) a écrit:
>
>   Thanks David for such a quick reply.
>
> I tried you command from one node to second node and got following result.
>
> {
>   "status" : 503,
>   "name" : "cleandata-DataNode-1",
>   "version" : {
> "number" : "1.0.1",
> "build_hash" : "5c03844e1978e5cc924dab2a423dc63ce881c42b",
> "build_timestamp" : "2014-02-25T15:52:53Z",
> "build_snapshot" : false,
> "lucene_version" : "4.6"
>   },
>   "tagline" : "You Know, for Search"
> }
>
> Dont know whehter it is correct or something is wrong.
>
> Regards
>
> Geet
>
>
> On Fri, Mar 21, 2014 at 12:43 PM, David Pilato  wrote:
>
>>  They are sharing same security group name and are deployed in same
>> region?
>> If unicast does not work using private IP, aws plugin won't work either.
>>
>> Can you from one node run
>>
>> curl http://secondnodeip:9200/
>>
>> Same from second node.
>>
>>
>> --
>> David ;-)
>> Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs
>>
>>
>> Le 21 mars 2014 à 08:08, Geet Gangwar  a écrit :
>>
>>   Hi
>>
>> I am also facing same issue of no dicovery.ec2 logs getting generated for
>> elasticsearch 1.0.1 and cloud-aws plugin 2.0.0.RC
>>
>> I tried the corrected version of setting given:
>>
>> cloud:
>>aws:
>>  access_key: XX
>>  secret_key: XX
>>  region: ap-southeast-1
>> discovery:
>>  type: ec2
>>  groups: 
>>
>> Difference is the region.
>>
>> I have created two nodes(one is master node with node.data = false and
>> other one is just a data node with node.master = false)
>>
>> Even when I try for unicast and specify hard coded private ip then also
>> the second node does not joins the cluster.
>>
>> I installed head plugin on the master node which is working fine and I m
>> able to view one node in the browser.
>>
>>
>> Please guide me on what I am doing wrong.
>>
>>
>>
>>
>>
>> On Thursday, March 20, 2014 3:51:36 AM UTC+5:30, Daniel F. wrote:
>>>
>>> Thanks David,
>>>
>>> You are right elasticsearch.yml had two errors: space before "discovery"
>>> and "region" should not be equal to availability zone "eu-west-1a" but "
>>> eu-west-1".
>>>
>>> The working version is:
>>>
>>> discovery.zen.ping.multicast.enabled: false
>>>
>>> cloud:
>>>   aws:
>>> access_key: 
>>> secret_key: 
>>> region: eu-west-1
>>> discovery:
>>> type: ec2
>>> groups: my-security-group
>>>
>>> Daniel
>>>
>>> On Wednesday, March 19, 2014 10:21:36 PM UTC+2, David Pilato wrote:

  I think your elasticsearch.yml is incorrect.

  Just saying because as far as I can see, ec2 discovery does not start
 actually.
  So, may be some space before cloud.type: ec2 is missing?


  --
 *David Pilato* | *Technical Advocate* | *Elasticsearch.com
 *
 @dadoonet  | 
 @elasticsearchfr


 Le 19 mars 2014 à 19:31:03, Daniel F. (feins...@gmail.com) a écrit:

  Yes, i am running elasticsearch as root. It is a test environment so
 I did not tune it yet.
 --
 You received this message because you are subscribed to the Google
 Groups "elasticsearch" group.
 To unsubscribe from this group and stop receiving emails from it, send
 an email to elasticsearc...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/5f49602f-9db0-4e68-9037-310a87064f1e%40googlegroups.com
 .
 For more options, visit https://groups.google.com/d/optout.

--
>> You received this message because you are subscribed to the Google Groups
>> "elasticsearch" group.
>>  To unsubscribe from this group and stop receiving emails from it, send
>> an email to elasticsearch+unsubscr...@googlegroups.com.
>>
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/elasticsearch/5d40ca9c-ea3b-4d21-84a1-3f6ef907a80e%40googlegroups.com
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>  --
>> You received this message because you are subscri

Re: choosing shard vs alice in elasticsearch

2014-03-21 Thread Adrien Grand

On Fri, Mar 21, 2014 at 5:41 AM, Chetana  wrote:

> 1. Is it a good idea to create shards based on size/period or create one
> shard with multiple alices based on filter condition?
>

I would recommend on using time-based indices, you can hear about the
rationale at
http://www.elasticsearch.org/videos/big-data-search-and-analytics/ (the
part you are interested in starts at 21:15 but I would recommend wathing
the whole video which gives interesting ideas about how to model data with
Elasticsearch). For example, you could imagine build monthly indices. Then
tools like curator can help you manage old indices, in order to force-merge
(optimize) the read-only onces and deletes the old ones.

> 2. Does ES merges the search results coming from multiple shards? If yes,
> how fast or better it is compared to lucene's ranking based on Vector space
> model?
>

Indeed, Elasticsearch merges search results: each shard returns its top
hits (via Lucene) and the node that coordinates search execution merges
these per-shard top-hits based on a priority queue.

-- 
Adrien Grand

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j7WvMMUGKkKGomW_djU3Hck-jbAYbTBj%3DYCLHUtaDng4g%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: User defined dictionary in lingo3g for Elasticsearch wrt label/word/synonym

Hi Jörg,

Wile exploring more I got the answer for first 3 points just wanted to get
clarification for point 4:

<<4) How I can check the built-in word databases wrt ES for clustering, is
word-dictionary.en.xml is the built-http://download.carrotsearch.com/lingo3g/manual/#section.attribute.use-built-in-word-database-for-label-filtering
 



--
View this message in context: 
http://elasticsearch-users.115913.n3.nabble.com/User-defined-dictionary-in-lingo3g-for-Elasticsearch-wrt-label-word-synonym-tp4052442p4052451.html
Sent from the ElasticSearch Users mailing list archive at Nabble.com.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/1395391994271-4052451.post%40n3.nabble.com.
For more options, visit https://groups.google.com/d/optout.

Re: Elasticsearch 1.0.1 on AWS

status is incorrect but I guess it's due to the fact your data node is not a
master node and can't find a master.

How did you set unicast?

--
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet | @elasticsearchfr

Le 21 mars 2014 à 09:33:35, Geet Gangwar (geetgang...@gmail.com) a écrit:

Thanks David for such a quick reply.

I tried you command from one node to second node and got following result.

{
"status" : 503,
"name" : "cleandata-DataNode-1",
"version" : {
"number" : "1.0.1",
"build_hash" : "5c03844e1978e5cc924dab2a423dc63ce881c42b",
"build_timestamp" : "2014-02-25T15:52:53Z",
"build_snapshot" : false,
"lucene_version" : "4.6"
},
"tagline" : "You Know, for Search"
}

Dont know whehter it is correct or something is wrong.

Regards

Geet

On Fri, Mar 21, 2014 at 12:43 PM, David Pilato wrote:
They are sharing same security group name and are deployed in same region?
If unicast does not work using private IP, aws plugin won't work either.

Can you from one node run

curl http://secondnodeip:9200/

Same from second node.

--
David ;-)
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 21 mars 2014 à 08:08, Geet Gangwar a écrit :

I am also facing same issue of no dicovery.ec2 logs getting generated for
elasticsearch 1.0.1 and cloud-aws plugin 2.0.0.RC

I tried the corrected version of setting given:

cloud:
aws:
access_key: XX
secret_key: XX
region: ap-southeast-1
discovery:
type: ec2
groups:

Difference is the region.

I have created two nodes(one is master node with node.data = false and other
one is just a data node with node.master = false)

Even when I try for unicast and specify hard coded private ip then also the
second node does not joins the cluster.

I installed head plugin on the master node which is working fine and I m able
to view one node in the browser.

Please guide me on what I am doing wrong.

On Thursday, March 20, 2014 3:51:36 AM UTC+5:30, Daniel F. wrote:
Thanks David,

You are right elasticsearch.yml had two errors: space before "discovery" and
"region" should not be equal to availability zone "eu-west-1a" but "eu-west-1".

The working version is:

discovery.zen.ping.multicast.enabled: false

cloud:
aws:
access_key:
secret_key:
region: eu-west-1
discovery:
type: ec2
groups: my-security-group

Daniel

On Wednesday, March 19, 2014 10:21:36 PM UTC+2, David Pilato wrote:
I think your elasticsearch.yml is incorrect.

Just saying because as far as I can see, ec2 discovery does not start actually.
So, may be some space before cloud.type: ec2 is missing?

--
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet | @elasticsearchfr

Le 19 mars 2014 à 19:31:03, Daniel F. (feins...@gmail.com) a écrit:

Yes, i am running elasticsearch as root. It is a test environment so I did not
tune it yet.
--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearc...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/5f49602f-9db0-4e68-9037-310a87064f1e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.

To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/5d40ca9c-ea3b-4d21-84a1-3f6ef907a80e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to a topic in the Google
Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/VyFXBVFp_oM/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/E4CABFBD-9F00-4AB0-9CC7-F06D05B52C9F%40pilato.fr.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAEXePwfmAyp9K88ykHUk78C6tyFPDvr9jFXSV%2BWSoJ3_7xDe-Q%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: EC2 Discovery

2014-03-21 Thread Norberto Meijome

Don't try ec2 discovery until you have tested that:
- you can connect from one machine to another on port 9300 ( nc as client
and server, basic networking/ firewalling)
- run a simple aws ec2 describe instances call with the API key you plan to
use, and you can see the machines you need there. Bonus points for
filtering based on the rules you intense to use ( sec group, tags). This is
to ensure your API keys have the correct access needed.

Once you have those basic steps working, use them on es config.

Make sure you enable ec2 discovery and disable the zen discovery ( it will
run first and likely time out and ec2 disco won't get to exec).

The other thing to watch out for is contacting nodes which are too busy to
ack your new nodes request for cluster info...but that would be a problem
with zen disco too.
On 21/03/2014 12:31 PM, "Raphael Miranda"  wrote:

> are both machines in the same security group?
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/eb8bb939-3b9d-4f5b-a45c-3d529f75983e%40googlegroups.com
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CACj2-4LjvKAW9cnkrjQUR6%3Dk8FRf%3DKzmDUAUopHLVUMNc1ixOw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

aggreation values with date time

2014-03-21 Thread Subhadip Bagui

Hi,

I'm using the below aggs for getting max, avg values from 2 data fields. 
But the result is coming in unix timeformat I guess. Can I get the result 
normal time format,

query==>
 "aggs" : {
"max_time" : { 
"max" : { 
"script" : "doc['gi_xdr_info_end_time'].value - 
 doc['gi_xdr_info_start_time'].value" 
} 
},...   
}
 
result ==>
   

   - "aggregations":{
  - "max_time":{
 - "value":6.0
  },
  - "min_time":{
 - "value":0.0
  },
  - "avg_time":{
 - "value":4.6664
  }
   }


mapping ==>

 "call_start_time":{

   - "type":"date",
   - "format":"dateOptionalTime"

}

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/99d63453-b863-4954-9f0f-ec7ccfd522e2%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: 2 clusters versus 1 big cluster?

2014-03-21 Thread Norberto Meijome

@mauri, thank you for such interesting analysis.
On 21/03/2014 1:01 PM, "Mauri"  wrote:

> Hi Brad
>
> I agree with what Mark and Zachary have said and will expand on these.
>
> Firstly, shard and index level operations in ElasticSearch are
> peer-to-peer. Single-shard operations will affect at most 2 nodes, the node
> receiving the request and the node hosting an instance of the shard
> (primary or replica depending on the operation). Multi-shard operations
> (such a searches) will affect from one to (N  +1) nodes where N is the
> number of shards in the index.
>
> So from an index/shard operation perspective there is no reason to split
> into two clusters. The key issue with index/shard operations is that the
> cluster is able to handle the traffic volume. So if you do decide to split
> out into to two clusters you will need to look at the load profile for each
> of your client types to determine how much raw processing power you need in
> each cluster. It may be that a 10:20 split is more optimum than a 15:15
> split between clusters to balance request traffic, and therefore CPU
> utilisation, across all nodes. If you go with one cluster this is not an
> issue because you can move shards between nodes to balance the request
> traffic.
>
> Larger clusters also imply more work for the cluster master in managing
> the cluster. This comes down to the number of nodes that the master has to
> communicate with, and manage, and the size of the cluster state. A cluster
> with 30 nodes is not too large for a master to keep track of. There will be
> an increase in network traffic associated with the increase in volume of
> master-to-worker and worker-to-master pings used to detect the
> presence/absence of nodes. This can be offset by reducing the respective
> ping intervals.
>
> In a large cluster it is good practice to have a group of dedicated master
> nodes, say 3, from which the master is elected. These nodes do not host any
> user data meaning that cluster management is not compromised by high user
> request traffic.
>
> The size of the cluster state may be more of an issue. The cluster state
> comprises all of the information about the cluster configuration. The
> cluster state has records for each node, index, document mapping, shard,
> etc. Whenever there is a change to the cluster state it is first made by
> the master which then sends the updated cluster state to each worker node.
> Note that the entire cluster state is sent, not just the changes! It is
> therefore highly desirable to limit that frequency of changes to the
> cluster state, primarily by minimizing dynamic field mapping updates, and
> the overall size of the cluster state, primarily by minimizing the number
> of indices.
>
> In your proposed model the size of the cluster state associated the set of
> 60 shared month indices will be larger than that of one set of 60 dedicated
> month indices by virtue of having 100 shards to 6. However, it may not be
> much bigger because there will be much more metadata associated with
> defining the index structure, notably the field mappings for all document
> types in the index, than the metadata defining the shards of the index. So
> it may well be that the size of the cluster state associated with 60
> "shared" month indices plus N sets of 60 "dedicated" indices is not much
> more than that of (N + 1) sets of 60 "dedicated" indices. So there may not
> be much point in splitting to two clusters. A quick way to look at this for
> your actual data model is to:
>   1. Set up an index in ES with mappings for all document types and 6
> shards and 0 replicas,
>   2. Retrieve the index metadata JSON using ES admin API,
>   3. Increase the number of replicas to 16 (102 shards total),
>   4. Retrieve the index metadata JSON using ES admin API,
>   5. Compare the two JSON documents from 2 and 4.
>
> As state above it is desirable to minimize the number of indices. Each
> shard is a Lucene index which consumes memory and requires open file
> descriptors from the OS for segment data files and Lucene index level
> files. You may find yourself running out of memory and/or file descriptors
> if you are not careful.
>
> I understand you are looking for a design that will cater for on disc data
> volume. Given that your data is split into monthly indices it may well be
> that no one index, either "shared" or "dedicated" will reach that volume in
> one month. There may also be seasonal factors to consider whereby one or
> two months have much higher volumes than others. I have read/heard about
> cases where a monthly index architecture was implemented but later scraped
> for a single index approach because the month-to-month variation in volume
> was detrimental to overall system resource utilisation and performance.
>
> In you case think about whether monthly indices are really appropriate. An
> alternative model is to partition one years worth of data into a set of
> indices bounded by size rather than time. In this

Re: User defined dictionary in lingo3g for Elasticsearch wrt label/word/synonym

Have you checked http://download.carrotsearch.com/lingo3g/manual/#section.esand
https://github.com/carrot2/elasticsearch-carrot2 and
https://github.com/carrot2/elasticsearch-carrot2/tree/master/src/main/resources?

Jörg



On Fri, Mar 21, 2014 at 9:08 AM, Prashant Agrawal <
prashant.agra...@paladion.net> wrote:

> While browsing the lingo3g manual I came across with
>
> http://download.carrotsearch.com/lingo3g/1.9.0/manual/#chapter.lexical-resources
>
> Which states that we can customize the name of the label as per pre defined
> Word/Label dictionary.
>
> So I have some doubts on basis of that:
>
> 1) Where exactly these files have to be kept in ES (either in ES/config or
> somewhere else)
>
> 2) Is it like if we are using these dictionaries so default dictionary with
> POS will not work in clustering the label?
>
> 3) If we use these particular dictionaries so the label name after
> clustering will be formed on basis of this only or some other logic is also
> there?
>
> 4) How I can check the built-in word databases wrt ES for clustering, is
> word-dictionary.en.xml is the built-in databse file for ES? Source:
>
> http://download.carrotsearch.com/lingo3g/manual/#section.attribute.use-built-in-word-database-for-label-filtering
>
>
>
> --
> View this message in context:
> http://elasticsearch-users.115913.n3.nabble.com/User-defined-dictionary-in-lingo3g-for-Elasticsearch-wrt-label-word-synonym-tp4052442.html
> Sent from the ElasticSearch Users mailing list archive at Nabble.com.
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/1395389303866-4052442.post%40n3.nabble.com
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoHBroMKpEdKLcXYBqhMwPVZHgcTbJw66U1gQ5cZC97Qcg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: Elasticsearch 1.0.1 on AWS

Thanks David for such a quick reply.

I tried you command from one node to second node and got following result.

{
  "status" : 503,
  "name" : "cleandata-DataNode-1",
  "version" : {
"number" : "1.0.1",
"build_hash" : "5c03844e1978e5cc924dab2a423dc63ce881c42b",
"build_timestamp" : "2014-02-25T15:52:53Z",
"build_snapshot" : false,
"lucene_version" : "4.6"
  },
  "tagline" : "You Know, for Search"
}

Dont know whehter it is correct or something is wrong.

Regards

Geet


On Fri, Mar 21, 2014 at 12:43 PM, David Pilato  wrote:

> They are sharing same security group name and are deployed in same region?
> If unicast does not work using private IP, aws plugin won't work either.
>
> Can you from one node run
>
> curl http://secondnodeip:9200/
>
> Same from second node.
>
>
> --
> David ;-)
> Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs
>
>
> Le 21 mars 2014 à 08:08, Geet Gangwar  a écrit :
>
> Hi
>
> I am also facing same issue of no dicovery.ec2 logs getting generated for
> elasticsearch 1.0.1 and cloud-aws plugin 2.0.0.RC
>
> I tried the corrected version of setting given:
>
> cloud:
>aws:
>  access_key: XX
>  secret_key: XX
>  region: ap-southeast-1
> discovery:
>  type: ec2
>  groups: 
>
> Difference is the region.
>
> I have created two nodes(one is master node with node.data = false and
> other one is just a data node with node.master = false)
>
> Even when I try for unicast and specify hard coded private ip then also
> the second node does not joins the cluster.
>
> I installed head plugin on the master node which is working fine and I m
> able to view one node in the browser.
>
>
> Please guide me on what I am doing wrong.
>
>
>
>
>
> On Thursday, March 20, 2014 3:51:36 AM UTC+5:30, Daniel F. wrote:
>>
>> Thanks David,
>>
>> You are right elasticsearch.yml had two errors: space before "discovery"
>> and "region" should not be equal to availability zone "eu-west-1a" but "
>> eu-west-1".
>>
>> The working version is:
>>
>> discovery.zen.ping.multicast.enabled: false
>>
>> cloud:
>>   aws:
>> access_key: 
>> secret_key: 
>> region: eu-west-1
>> discovery:
>> type: ec2
>> groups: my-security-group
>>
>> Daniel
>>
>> On Wednesday, March 19, 2014 10:21:36 PM UTC+2, David Pilato wrote:
>>>
>>> I think your elasticsearch.yml is incorrect.
>>>
>>> Just saying because as far as I can see, ec2 discovery does not start
>>> actually.
>>> So, may be some space before cloud.type: ec2 is missing?
>>>
>>>
>>>  --
>>> *David Pilato* | *Technical Advocate* | *Elasticsearch.com
>>> *
>>> @dadoonet  | 
>>> @elasticsearchfr
>>>
>>>
>>> Le 19 mars 2014 à 19:31:03, Daniel F. (feins...@gmail.com) a écrit:
>>>
>>>  Yes, i am running elasticsearch as root. It is a test environment so I
>>> did not tune it yet.
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "elasticsearch" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to elasticsearc...@googlegroups.com.
>>> To view this discussion on the web visit https://groups.google.com/d/
>>> msgid/elasticsearch/5f49602f-9db0-4e68-9037-310a87064f1e%
>>> 40googlegroups.com
>>> .
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
>
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/5d40ca9c-ea3b-4d21-84a1-3f6ef907a80e%40googlegroups.com
> .
> For more options, visit https://groups.google.com/d/optout.
>
> --
> You received this message because you are subscribed to a topic in the
> Google Groups "elasticsearch" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/elasticsearch/VyFXBVFp_oM/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to
> elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/E4CABFBD-9F00-4AB0-9CC7-F06D05B52C9F%40pilato.fr
> .
>
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an e

Log for ALL elasticsearch search queries, query latency, and query hits

2014-03-21 Thread Dhruv Garg

Hey folks,

I'd like to setup logstash to keep track of all search queries made against 
my elasticsearch cluster along with the amount of time it takes to return 
the results and number of results returned. Ideally the log file should 
contain something like:

{ query: foobar, time_taken: 100ms, num_hits: 5 }

Does elasticsearch expose a logfile that contains this information? I could 
only see the query_slow and index_slow log files, which contain a subset of 
the information I'd like logstash to process.

Worst case, I log this data in the application layer, but it would be nice 
if elasticsearch took care of it - better separation of concerns.

Thanks,

Dhruv

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/88771eba-8abf-49e3-a8e6-c4df00fcdacc%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Running aggregations on two different nested objects

2014-03-21 Thread Adrien Grand

On Fri, Mar 21, 2014 at 8:15 AM, Jean-Noël Rivasseau wrote:

> Thanks for your reply. What do you mean by "not possible to escape it" ?
> Could you provide a sample code in Java, that would work if the necessary
> changes would be implemented?
>

The nested field mapper stores data as separate Lucene documents. What the
nested aggregation does, is that for every incoming (parent) document ID,
it is going to call sub aggregations with the document ID of children
documents. The sub aggregations are not aware that they are being applied
to child documents, to them it doesn't make any difference, they just do
their usual stuff, but on different doc IDs.

What would be needed in order to make your aggregation work would be to
have another aggregation that would be able to translate these child doc
IDs back to their parent's doc ID, which is something that we don't have
today.

-- 
Adrien Grand

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j61pgEgKHLjM4nuuG%2Bg_rfM6r4bZjZ0ia%2B46vO7bvtgUg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

User defined dictionary in lingo3g for Elasticsearch wrt label/word/synonym

While browsing the lingo3g manual I came across with
http://download.carrotsearch.com/lingo3g/1.9.0/manual/#chapter.lexical-resources

Which states that we can customize the name of the label as per pre defined
Word/Label dictionary.

So I have some doubts on basis of that:

1) Where exactly these files have to be kept in ES (either in ES/config or
somewhere else)

2) Is it like if we are using these dictionaries so default dictionary with
POS will not work in clustering the label?

3) If we use these particular dictionaries so the label name after
clustering will be formed on basis of this only or some other logic is also
there?

4) How I can check the built-in word databases wrt ES for clustering, is
word-dictionary.en.xml is the built-in databse file for ES? Source:
http://download.carrotsearch.com/lingo3g/manual/#section.attribute.use-built-in-word-database-for-label-filtering



--
View this message in context: 
http://elasticsearch-users.115913.n3.nabble.com/User-defined-dictionary-in-lingo3g-for-Elasticsearch-wrt-label-word-synonym-tp4052442.html
Sent from the ElasticSearch Users mailing list archive at Nabble.com.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/1395389303866-4052442.post%40n3.nabble.com.
For more options, visit https://groups.google.com/d/optout.

Re: JVM Heap usage didn't decline after clearing field data cache

For example, if you have a resident filter cache configured and there is
heap congestion, clear cache may temporarily help to release objects back
to the JVM. After a while they are GCed.

Usually, in most other cases, cached objects are GCed when memory becomes
low, so there is not much to worry about clearing cache by command.

Jörg



On Fri, Mar 21, 2014 at 8:19 AM, Gary Gao  wrote:

> So, What is Clear Cache API for ?
>
>
> On Wed, Mar 19, 2014 at 9:22 PM, joergpra...@gmail.com <
> joergpra...@gmail.com> wrote:
>
>> JVM heap objects are reclaimed by garbage collection, not by clear cache
>> command.
>>
>> Jörg
>>
>>
>> On Wed, Mar 19, 2014 at 4:17 AM, Gary Gao  wrote:
>>
>>> Hi,
>>>  I have a total 8GB of JVM Heap, it't usage was near 50%,
>>>  While I cleared field data cache（almost 1GB） using Clear Cache API:
>>> $curl -XPOST 'http://localhost:9200/_cache/
>>> clear?field_data=true'
>>>   field data cahced size decreased, indeed, but JVM heap usage
>>> didn't declined.
>>>   What's the reasons ?
>>>
>>>
>>>
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "elasticsearch" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to elasticsearch+unsubscr...@googlegroups.com.
>>>
>>> To view this discussion on the web visit
>>> https://groups.google.com/d/msgid/elasticsearch/5e4a1e94-d6f0-4069-9d62-481c5a974f2c%40googlegroups.com
>>> .
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>>  --
>> You received this message because you are subscribed to a topic in the
>> Google Groups "elasticsearch" group.
>> To unsubscribe from this topic, visit
>> https://groups.google.com/d/topic/elasticsearch/PWbqlVw9OEE/unsubscribe.
>> To unsubscribe from this group and all its topics, send an email to
>> elasticsearch+unsubscr...@googlegroups.com.
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/elasticsearch/CAKdsXoGnmE44ikZb03Km-nr9CKwL%2BMuOjQvQ1LD0XLf0ZxuAbA%40mail.gmail.com
>> .
>>
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/CAAgk9rx2t8pDJ86Nwd0t6zE_DLKhupB8TZcoRG7khufucWfGAA%40mail.gmail.com
> .
>
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoEYBZQkEc%2BU6YUysHrB3c%2B4bDq-QCV4GO%3DbHCKncykx_A%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: Suppressing the content in Cluster response

Any help with respect to this scenario?



--
View this message in context: 
http://elasticsearch-users.115913.n3.nabble.com/Suppressing-the-content-in-Cluster-response-tp4052109p4052440.html
Sent from the ElasticSearch Users mailing list archive at Nabble.com.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/1395389203958-4052440.post%40n3.nabble.com.
For more options, visit https://groups.google.com/d/optout.

ElasticSearch nodes fail at random

2014-03-21 Thread Saurabh Minni

Hi,
I am facing this weird issue where I have created an ES cluster with 3 data 
nodes and 1 master node.

All data nodes are dual octa core CPU with 32 GB RAM and the master is a 
quad core 16GB machine
I am trying to insert around 3000 records per second with replica set as 1 
. Each record is 3KB in size.
All goes well with the setup but suddenly one of the machine's load shoots 
up dramatically and then finally the machines becomes unreachable.
It happens with 1 of the 3 data nodes at random. 
On further inspection I can see that page faults on that system for java 
process increases very quickly compared to others. System IO% also shoots 
up. 

I have used mlockall and made refresh index rate as -1. 

But inspite of all this, the cluster keeps on failing. I was hoping that 
insert rate with ElasticSearch would be really high reading at posts at 
various places. 

Please let me know if you know of any setting that needs to be changed. 

Thanks for your help in advance.
Saurabh 



-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/22264709-ddd5-4389-ac6f-9640c7523036%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: JVM Heap usage didn't decline after clearing field data cache

2014-03-21 Thread Gary Gao

So, What is Clear Cache API for ?


On Wed, Mar 19, 2014 at 9:22 PM, joergpra...@gmail.com <
joergpra...@gmail.com> wrote:

> JVM heap objects are reclaimed by garbage collection, not by clear cache
> command.
>
> Jörg
>
>
> On Wed, Mar 19, 2014 at 4:17 AM, Gary Gao  wrote:
>
>> Hi,
>>  I have a total 8GB of JVM Heap, it't usage was near 50%,
>>  While I cleared field data cache（almost 1GB） using Clear Cache API:
>> $curl -XPOST 'http://localhost:9200/_cache/
>> clear?field_data=true'
>>   field data cahced size decreased, indeed, but JVM heap usage didn't
>> declined.
>>   What's the reasons ?
>>
>>
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to elasticsearch+unsubscr...@googlegroups.com.
>>
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/elasticsearch/5e4a1e94-d6f0-4069-9d62-481c5a974f2c%40googlegroups.com
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>  --
> You received this message because you are subscribed to a topic in the
> Google Groups "elasticsearch" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/elasticsearch/PWbqlVw9OEE/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to
> elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/CAKdsXoGnmE44ikZb03Km-nr9CKwL%2BMuOjQvQ1LD0XLf0ZxuAbA%40mail.gmail.com
> .
>
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAAgk9rx2t8pDJ86Nwd0t6zE_DLKhupB8TZcoRG7khufucWfGAA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: Running aggregations on two different nested objects

2014-03-21 Thread Jean-Noël Rivasseau

Hello,

Thanks for your reply. What do you mean by "not possible to escape it" ? 
Could you provide a sample code in Java, that would work if the necessary 
changes would be implemented?

Jean-Noel

On Thursday, March 20, 2014 1:51:54 PM UTC+4, Adrien Grand wrote:
>
> Hi,
>
> The aggregation doesn't work because today, when you enter the context of 
> a nested field in an aggregation, it is not possible to escape it. I don't 
> think there is an easy way to modify your data model in order to work 
> around this issue, however this is an issue that we plan to fix in the 
> future (not in the upcoming 1.1 release however, rather in a few months).
>
>
> On Wed, Mar 19, 2014 at 10:00 AM, Jean-Noël Rivasseau 
> 
> > wrote:
>
>> Hello,
>>
>> I just started using ElasticSearch 1.0.1. I am trying to find the ideal 
>> data model and query for my exact needs, which I will explain below (I 
>> changed just the terms of the data model corresponding to my real use case, 
>> in order to see if I was able to formulate it differently, which was 
>> useful).
>>
>> I am indexing documents corresponding to BookedStay. A BookedStay has a 
>> nested array (named places) containing map objects corresponding to visited 
>> places during the stay. An object has an id (place id) and a category 
>> corresponding to the time of day of the visited place. A BookedStay then 
>> has a second nested array, corresponding to the amenities used during the 
>> stay. The objects in the array have an id (of type string) and a count.
>>
>> So a BookedStay can be represented as : {"date": 03/03/2014, 
>> "placesVisited": [{"id": 3, "category": "MORNING"}, {"id": 5, 
>> "category":  "AFTERNOON"}, {"id": 7, "category": "EVENING"}], "amenities": 
>> [{"amenityId": "restaurant", "count": 3}, {"amenityId": "dvdPlayer", 
>> "count": 1}] }
>>
>> What I want to run is a query over a given room number, and find for 
>> all BookedStay that have this given place number in their places array, an 
>> aggregate over all amenities used, per time of day.
>>
>> This amounts to finding, for all documents that have a place id of (for 
>> instance) 5, the number of times the restaurant was used, or the dvd player 
>> in the lounge, broken down by time of day. The ultimate goal is to 
>> understand better how the visit of a place in a given time of day affects 
>> the services sold by the hotel.
>>
>> I am unable to achieve this query, as when I run a first nested aggregate 
>> over the category, I cannot nest the second one over the amenities as it is 
>> in the "parent" document. Is it possible to do that? In that case, how do I 
>> specify that the nested aggregation will take place over the parent object 
>> of the current aggregation?
>>
>> Here is a tentative query with the Java driver (obviously not working, 
>> because of the above problem):
>>
>> SearchRequestBuilder srb = 
>> elasticSearchService.getClient().prepareSearch("test_index").setSearchType(SearchType.COUNT).setTypes("test_stay").setQuery(QueryBuilders.nestedQuery("placesVisited",
>>  
>> QueryBuilders.termQuery("id", 5)))
>>  
>> .addAggregation(AggregationBuilders.nested("nestedPlaceVisited").path("placesVisited")
>> .subAggregation(AggregationBuilders.filter("currentPlaceFilter").filter(FilterBuilders.termFilter("id",
>>  
>> 5))
>>  
>> .subAggregation(AggregationBuilders.terms("countPerTimeCategory").field("category")
>> .subAggregation(AggregationBuilders.nested("nestedAmenities").path("amenities")
>>  
>>  // HERE THIS subaggregation should run over the original document... and I 
>> dont know how to achieve that
>>  
>> subAggregation(AggregationBuilders.terms("amenitiesUsed").field("amenities.amenityId"))
>>
>> Thanks for your help over this difficult problem! If it's not possible 
>> with "parent aggregations", how should I refactor my data model?
>>
>> Jean-Noel
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to elasticsearc...@googlegroups.com .
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/e8a752fc-0a96-437d-b071-4009c0f39d33%40googlegroups.com
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>
>
> -- 
> Adrien Grand
>  

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/c79aba35-5c3b-4356-bff2-687378fb4261%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Elasticsearch 1.0.1 on AWS

They are sharing same security group name and are deployed in same region?
If unicast does not work using private IP, aws plugin won't work either.

Can you from one node run 

curl http://secondnodeip:9200/

Same from second node.


--
David ;-)
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs


Le 21 mars 2014 à 08:08, Geet Gangwar  a écrit :

Hi 

I am also facing same issue of no dicovery.ec2 logs getting generated for 
elasticsearch 1.0.1 and cloud-aws plugin 2.0.0.RC

I tried the corrected version of setting given:

cloud:
   aws:
 access_key: XX
 secret_key: XX
 region: ap-southeast-1
discovery:
 type: ec2
 groups: 

Difference is the region. 

I have created two nodes(one is master node with node.data = false and other 
one is just a data node with node.master = false)

Even when I try for unicast and specify hard coded private ip then also the 
second node does not joins the cluster.

I installed head plugin on the master node which is working fine and I m able 
to view one node in the browser.


Please guide me on what I am doing wrong. 





> On Thursday, March 20, 2014 3:51:36 AM UTC+5:30, Daniel F. wrote:
> Thanks David,
> 
> You are right elasticsearch.yml had two errors: space before "discovery" and 
> "region" should not be equal to availability zone "eu-west-1a" but 
> "eu-west-1".
> 
> The working version is:
> 
> discovery.zen.ping.multicast.enabled: false
> 
> cloud:
>   aws:
> access_key: 
> secret_key: 
> region: eu-west-1
> discovery:
> type: ec2
> groups: my-security-group
> 
> Daniel
> 
>> On Wednesday, March 19, 2014 10:21:36 PM UTC+2, David Pilato wrote:
>> I think your elasticsearch.yml is incorrect.
>> 
>> Just saying because as far as I can see, ec2 discovery does not start 
>> actually.
>> So, may be some space before cloud.type: ec2 is missing?
>> 
>> 
>> -- 
>> David Pilato | Technical Advocate | Elasticsearch.com
>> @dadoonet | @elasticsearchfr
>> 
>> 
>>> Le 19 mars 2014 à 19:31:03, Daniel F. (feins...@gmail.com) a écrit:
>>> 
>>> Yes, i am running elasticsearch as root. It is a test environment so I did 
>>> not tune it yet.
>>> --
>>> You received this message because you are subscribed to the Google Groups 
>>> "elasticsearch" group.
>>> To unsubscribe from this group and stop receiving emails from it, send an 
>>> email to elasticsearc...@googlegroups.com.
>>> To view this discussion on the web visit 
>>> https://groups.google.com/d/msgid/elasticsearch/5f49602f-9db0-4e68-9037-310a87064f1e%40googlegroups.com.
>>> For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/5d40ca9c-ea3b-4d21-84a1-3f6ef907a80e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/E4CABFBD-9F00-4AB0-9CC7-F06D05B52C9F%40pilato.fr.
For more options, visit https://groups.google.com/d/optout.

Re: Elasticsearch 1.0.1 on AWS