Re: ElasticSearch built-in Jackson stream parser is fastest way to extract fields

2015-04-30 Thread Brian
Swati,

Well, I tend not to use the built-in Jackson parser anymore. The only 
advantage I've seen to stream parsing is that I can dynamically adapt to 
different objects in my own code. But I can't release the code since it's 
owned by my employer. And for most tasks these days, I use the Jackson jar 
files and the data binding model. By the way, here are the only additional 
JAR files that I use in my Elasticsearch-based tools that also include 
Elasticsearch jars:

For the full Jackson support. There are later versions but these work for 
now until the rest of the company moves to Java 8:

jackson-annotations-2.2.3.jar
jackson-core-2.2.3.jar
jackson-databind-2.2.3.jar

This gives me the full Netty server (got tired of looking for it buried 
inside ES, and found this to be very simple and easy to use). Again, there 
are later versions but this one works well enough:

netty-3.5.8.Final.jar

And this is the magic that brings Netty to life. My front end simply 
publishes each incoming Netty MessageEvent to the LMAX Disruptor ring 
buffer. Then I can predefine a fixed number of background WorkHandler 
threads to consume the MessageEvent objects, handling each one and 
responding back to its client. No matter how much load is slammed into the 
front end, the number of Netty threads stays small since they only publish 
and they're done. And so, the total thread count stays small even when 
intense bursts of clients slam the server:

disruptor-3.2.0.jar

I hope this helps. I'd love to publish more details but this is about all I 
can do for now.

Brian

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/773a4516-89c1-4e21-bd65-e5e7bf48c7e4%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Consultant Needed for initial setup / audit / tuning /etc.

2015-04-22 Thread Brian Gruber
Looking to see if we have setup the system efficiently and correctly as 
well as general guidance.

I'm looking for someone to provide some initial consulting (not very long) 
to give my current setup a nice audit and make sure I've set things up 
efficiently/correctly. I'm having trouble finding anyone online that 
doesn't want an annual contract. Anyone here available or know of someone? 

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/3f1bf3a5-543b-47af-a90d-1971c6cfbb6e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Rebuilding master node caused data loss

2015-04-21 Thread Brian
I have a cluster with 5 data nodes, and 1 master node.  I decided to test a 
master node failure, and clearly I miss understood exactly what is stored 
on the master.  I turned down the VM running the master node, and built a 
new one from scratch.  I then added it to the cluster as a master.  When 
this came online, I lost all my data that was in cluster previously and it 
started making new indexes clean again.  Now this isn't critical data, this 
is my test setup, but it still confused me.

I have looked into this and it would seem there is a deafult setting for 
gateway.local.auto_import_dangled. 
As I understand it, this was put in place for people like me who didn't 
understand whta would happen if you lost a master node, and should by 
default have imported the old data from each data node.  If this was 
defaulted to no, and just deleted the data, I would know exactly what 
happened.  I have looked at my configurations and I haven't set this to no, 
and yet the data was deleted.

Can someone clarify if this setting is no longer valid, or if the default 
hsa been changd and not documented?

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/bbfc97a6-a775-41f4-b34c-e38d7c2c515d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: shingle filter for sub phrase matching

2015-04-20 Thread brian
Did you ever figure this out? I have the same exact issue but using 
different words.  

On Wednesday, July 23, 2014 at 10:37:03 AM UTC-4, Nick Tackes wrote:

 I have created a gist with an analyzer that uses filter shingle in 
 attempt to match sub phrases. 

 For instance I have entries in the table with discrete phrases like 

 EGFR 
 Lung Cancer 
 Lung 
 Cancer 

 and I want to match these when searching the phrase 'EGFR related lung 
 cancer 

 My expectation is that the multi word matches score higher than the single 
 matches, for instance... 
 1. Lung Cancer 
 2. Lung 
 3. Cancer 
 4. EGFR 

 Additionally, I tried a standard analyzer match but this didn't yield the 
 desired result either. One complicating aspect to this approach is that the 
 min_shingle_size has to be 2 or more. 

 How then would I be able to match single words like 'EGFR' or 'Lung'? 

 thanks

 https://gist.github.com/nicktackes/ffdbf22aba393efc2169.js



-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/9f480904-aca7-468b-9d43-4243b65899df%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Creating a dynamic_template with a patch_match of arbitrary depth

2015-04-07 Thread Brian Levine
https://github.com/elastic/elasticsearch/issues/10467

On Friday, April 3, 2015 at 10:36:00 AM UTC-4, Brian Levine wrote:

 I'm indexing documents with nested objects where some of the objects 
 include unique ids (GUIDS). I want all such fields to be not_analyzed. 
 The id fields always have an '_id' suffix however, these fields can appear 
 at arbitrary levels in the document hierarchy. I'm trying to come up with a 
 dynamic mapping template to address this so that any field of the form 
 *_id regardless of the nesting depth will be marked as not_analyzed. I 
 don't think there's a way to specify this as a single patch_match, but I 
 just wanted to confirm that I'm not missing something.  In practice, I 
 suppose the nesting will never go deeper than let's say, 5.  So I could 
 define 5 path_match patterns like *_id, *.*_id, *.*.*_id...etc. Although 
 experience shows that the moment I do this, we'll find the need to go to 6 
 levels ;-). Ideally, you'd be able to specify a path in ANT-like syntax 
 e.g., **/*_id.  Maybe I'll write up an enhancement request for this.

 Thanks.

 -b


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/3242a687-9d76-4b83-a998-69a393cb5a9c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Creating a dynamic_template with a patch_match of arbitrary depth

2015-04-03 Thread Brian Levine
I'm indexing documents with nested objects where some of the objects 
include unique ids (GUIDS). I want all such fields to be not_analyzed. 
The id fields always have an '_id' suffix however, these fields can appear 
at arbitrary levels in the document hierarchy. I'm trying to come up with a 
dynamic mapping template to address this so that any field of the form 
*_id regardless of the nesting depth will be marked as not_analyzed. I 
don't think there's a way to specify this as a single patch_match, but I 
just wanted to confirm that I'm not missing something.  In practice, I 
suppose the nesting will never go deeper than let's say, 5.  So I could 
define 5 path_match patterns like *_id, *.*_id, *.*.*_id...etc. Although 
experience shows that the moment I do this, we'll find the need to go to 6 
levels ;-). Ideally, you'd be able to specify a path in ANT-like syntax 
e.g., **/*_id.  Maybe I'll write up an enhancement request for this.

Thanks.

-b

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/f545d11b-3ce0-4e27-9a00-7dc397d34043%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Do I have to explicitly exclude the _all field in queries?

2015-03-24 Thread Brian Levine
OK, I think I figured this out.  It's the space between Brian and 
Levine. The query:

query: Node.author:Brian Levine

is actually interpreted as Node.author:Brian OR Levine in which case 
Levine is searched for in the _all field.  Seems so obvious now!  ;-)

-b


On Tuesday, March 24, 2015 at 1:50:42 PM UTC-4, Brian Levine wrote:

 Hi all,

 I clearly haven't completely grokked something in how QueryString queries 
 are interpreted.  Consider the following query:

 {
 query: {
 query_string: {
analyze_wildcard: true,
query: Node.author:Brian Levine
 }
 },
 fields: [Node.author],
 explain:true
 }

 Note:  The Node.author field is not_analyzed.

 The results from this query include documents for which the Node.author 
 field contains neither Brian nor Levine.  In examining the the 
 explanation, I found that the documents were included because another field 
 in the document contained Levine.  A snippet from the explanation shows 
 that the _all field was considered:

 {
 value: 0.08775233,
 description: weight(_all:levine in 464) [PerFieldSimilarity], 
 result of:,
 ...

 Do I need to explicitly exclude the _all field in the query? 

 Separate question: Because the Node.author field is not_analyzed, I had 
 thought that the value Brian Levine would also not be analyzed and 
 therefore only documents whose Node.author field contained Brian Levine 
 exactly would be matched, yet the explanation shows that the brian and 
 levine tokens were considered. I also noticed that if I change the query 
 to:

 query: Node.author:(Brian Levine)

 then result set changes. Only the documents whose Node.author field 
 contains either brian OR levine are included (which is what I would 
 have expected). According to the explanation, the _all field is not 
 considered in this query.

 So I'm confused.  Clearly, I don't understand how my original query is 
 interpreted.

 Hopefully, someone can enlighten me.

 Thanks.

 -brian



-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/63c5fff3-cfd5-4003-89ad-015d8fdb4879%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Do I have to explicitly exclude the _all field in queries?

2015-03-24 Thread Brian Levine
Hi all,

I clearly haven't completely grokked something in how QueryString queries 
are interpreted.  Consider the following query:

{
query: {
query_string: {
   analyze_wildcard: true,
   query: Node.author:Brian Levine
}
},
fields: [Node.author],
explain:true
}

Note:  The Node.author field is not_analyzed.

The results from this query include documents for which the Node.author 
field contains neither Brian nor Levine.  In examining the the 
explanation, I found that the documents were included because another field 
in the document contained Levine.  A snippet from the explanation shows 
that the _all field was considered:

{
value: 0.08775233,
description: weight(_all:levine in 464) [PerFieldSimilarity], result 
of:,
...

Do I need to explicitly exclude the _all field in the query? 

Separate question: Because the Node.author field is not_analyzed, I had 
thought that the value Brian Levine would also not be analyzed and 
therefore only documents whose Node.author field contained Brian Levine 
exactly would be matched, yet the explanation shows that the brian and 
levine tokens were considered. I also noticed that if I change the query 
to:

query: Node.author:(Brian Levine)

then result set changes. Only the documents whose Node.author field 
contains either brian OR levine are included (which is what I would 
have expected). According to the explanation, the _all field is not 
considered in this query.

So I'm confused.  Clearly, I don't understand how my original query is 
interpreted.

Hopefully, someone can enlighten me.

Thanks.

-brian

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/120103bc-a60d-418a-b092-09f71732b682%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Elastic and Kibana, indexing al Json with an field array looks like a plain String.

2015-02-17 Thread Brian Lowrance
you happen to figure out a solution for this?

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/7f5abacf-e2b5-41e9-b9d0-6515a336fb8a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Elasticsearch throwing an “OutOfMemoryError[unable to create new native thread]” error

2015-01-30 Thread brian
We just did a rolling restart of our server, but now every few hours our 
cluster stops responding to API calls. Instead, when we make a call, I get 
a response like this: 

curl -XGET 'http://localhost:9200/_cluster/health?pretty'
{ 
  error : OutOfMemoryError[unable to create new native thread], 
  status : 500 
} 

I noticed that we can still index data fine, it seems, but cannot search or 
call any API functions. This seems to happen every few hours, and the most 
recent time it happened, there was no logs in any of the node's log files. 

Our cluster is 8 nodes over 5 servers (3 servers run 2 elasticsearch 
processes, 2 run 1), running RHEL6u5. We are running Elasticsearch 1.3.4. 

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/c3e19365-d993-4f38-9857-a0a709546165%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: How does sorting on _id work?

2015-01-27 Thread Brian
Yes, the _id field is a string. You are not limited to numbers. In fact, an 
automatically generated ID has many non-numeric characters in it.

For what you want, you should create an id field, map it to a long integer, 
and then copy your _id into that id field when you load the document. Then 
when you sort on the id field, you will get a numeric sort.

Hope this helps.

Brian

On Tuesday, January 27, 2015 at 1:28:44 PM UTC-5, Abid Hussain wrote:

 ... can it be that _id is treated as string? If so, is there any way 
 retrieve the max _id field with treating _id as integer?

 Am Dienstag, 27. Januar 2015 19:24:41 UTC+1 schrieb Abid Hussain:

 Hi all,

 I want to determine the doc with max and min _id value. So, when I run 
 this query:
 GET /my_index/order/_search
 {
 fields: [ _id ],
 sort: [
{ _uid: { order: desc } }
 ],
 size: 1
 }
 I get a result:
 {
...
hits: {
   ...
   hits: [
  {
 _index: my_index,
 _type: order,
 _id: 99,
 _score: null,
 sort: [
order#99
 ]
  }
   ]
}
 }

 There is definitevely a doc with _id value 11132106 in index which I 
 would have expected as result.

 And, when I run the same search with *order asc* I get a result with 
 _id 100 which is higher than 99...?

 What am I doing wrong?

 Regards,

 Abid



-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/7842cdcf-67ee-48ed-8ef6-e8be2bb63a4a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: How to find all docs where field_a === val1 and field_b === val2?

2015-01-14 Thread Brian
By the way, David, the full query follows:

{
  from : 0,
  size : 20,
  timeout : 6,
  *query* : {
*bool* : {
  *must* : [ {
match : {
  field_a : {
query : val1,
type : boolean
  }
}
  }, {
match : {
  field_b : {
query : val2,
type : boolean
  }
}
  } ]
}
  },
  version : true,
  explain : false,
  fields : [ _ttl, _source ]
}

Also note that since the _ttl field is being requested (always), then the 
_source must also be asked for explicitly. If you don't ask for any fields, 
_source is returned by default. But if you ask for one or more fields 
explicitly, then you must also ask for _source or it won't be returned.

Brian

On Wednesday, January 14, 2015 at 6:31:29 PM UTC-5, Brian wrote:

 David,

 This is what I use. I hope it helps.

 {
   *bool* : {
 *must* : [ {
   match : {
 field_a : {
   query : val1,
   type : boolean
 }
   }
 }, {
   match : {
 field_b : {
   query : val2,
   type : boolean
 }
   }
 } ]
   }
 }

 Brian


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/ee28beb0-4ee6-463d-8891-a2d158e00934%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: How to find all docs where field_a === val1 and field_b === val2?

2015-01-14 Thread Brian
David,

This is what I use. I hope it helps.

{
  *bool* : {
*must* : [ {
  match : {
field_a : {
  query : val1,
  type : boolean
}
  }
}, {
  match : {
field_b : {
  query : val2,
  type : boolean
}
  }
} ]
  }
}

Brian

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/64aa8c31-2002-4432-bd7f-5c9bba3184da%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Indices Stats using the NodeClient with the Java API

2015-01-13 Thread Brian
Marc,

Maybe these snippets will help?

The enclosing class's constructor sets the client data member to either a 
Node client or Transport client (both work fine; I prefer the 
TransportClient).

The source string contains one or more index names, with a comma between 
each pair of names. A name may contain wildcards as supported by ES.

This code works for 1.3.4. Not sure if 4.X has yet another breaking change, 
but if that's the case, it is usually no big deal to handle.

*public static String[] parseIndex(String source)*
{
  return indexSplitter.split(source);
}

*public String[] getIndexNames(String indexPattern) throws UtilityException*
{
  if (indexPattern.trim().isEmpty())
throw new UtilityException(Cannot resolve empty index pattern [
+ indexPattern + ]);

  try
  {
/*
 * Parse the index pattern on commas, if present. Then pass the list of
 * individual names (which may include wildcards and - signs) to
 * Elasticsearch for final resolution
 */
String[] indexSpecList = parseIndex(indexPattern);

/*
 * Get the list of individual index names, along with their status
 * information (which we will ignore in this method)
 */
IndicesAdminClient iac = client.admin().indices();
RecoveryRequestBuilder isrb = iac.prepareRecoveries();
isrb.setIndices(indexSpecList);
RecoveryResponse isr = isrb.execute().actionGet();

/* Create an array of just the names of the indices */
ArrayListString indices = new ArrayListString();
MapString, ListShardRecoveryResponse sr = isr.shardResponses();
for (String index : sr.keySet())
{
  indices.add(index.trim());
}

/* Be sure there is at least one index that matches the pattern */
if (indices.isEmpty())
  throw new UtilityException(Cannot resolve index pattern [
  + indexPattern + ] to at least one existing index);

/* Convert to String[] and return */
return indices.toArray(new String[indices.size()]);
  }
  catch (ElasticsearchException e)
  {
throw new UtilityException(Cannot resolve index pattern [
+ indexPattern + ]:  + e);
  }
}

private final Client client;

private static final Pattern indexSplitter = Pattern.compile(Pattern
 .quote(,));


Brian


On Tuesday, January 13, 2015 at 6:56:07 AM UTC-5, Marc wrote:

 Hi,

 I would like to get a list of the available indices in my cluster using 
 the java api using the node client.

 Currently the request is done via the REST interface similar to this:
 http://localhost:9200/logstash-*/_stats/indices

 Cheers
 Marc



-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/49ed97eb-6c2f-4c56-a551-ebc86c9559ca%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Input file with custom delimiter

2015-01-07 Thread Brian
Gopi,

You really have a CSV file but using ^ instead of , as your delimiter.

I happen to write my own CSV-to-JSON converter, giving it the options I 
needed (including specification or auto-detection of numbers, date format 
normalization, auto-creating of the action and meta data line, and so on). 
I did this before stumbling across logstash, but still found it easier to 
write and maintain this code myself.

Choose the language you wish: I wrote one version of mine in C++ but the 
subsequent version in Java. I also wrote a bulk load client in Java to 
avoid the limitations of curl (and also its complete lack of existence on 
various platforms).

(logstash is much better for log files; my converter is much better for 
generic CSV)

I know this isn't exactly the pre-written tool you are looking for. But 
converting the CSV (with the option to override the delimiter values) into 
JSON isn't very hard to do. And once that's done, it's an easy matter to 
add the action and meta data and have a bulk-ready data stream.

Brian

On Wednesday, January 7, 2015 6:40:34 AM UTC-5, Gopimanikandan Sengodan 
wrote:

 Hi All,

 We are planning to load the data to elastic search from the delimited file.

 The file has been delimited with 0x88(ˆ) delimiter. 

 Can you please let me know how to load the delimited file to Elastic?

 Also, Please let me know what is the best and fastest way to load the 
 millions of data to Elastic search?


 SAMPLE:

 XˆYYˆ

 Thanks,
 Gopi


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/c0e6be2e-d94c-4538-89d6-d7afdb6945af%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Input file with custom delimiter

2015-01-07 Thread Brian
I wish I could, but currently prohibited. However, I can point you to some 
very good Java libraries:

The CSV parser supplied by the Apache project works well:

https://commons.apache.org/proper/commons-csv/apidocs/org/apache/commons/csv/CSVParser.html

You can override the delimiter using the static CSVFormat newFormat(char 
delimiter) method which creates a new CSV format with the specified 
delimiter:

https://commons.apache.org/proper/commons-csv/apidocs/org/apache/commons/csv/CSVFormat.html

Then use the XContentBuilder cb = jsonBuilder() method call to create a 
content builder to convert your records to single-line JSON.

For example, the action and meta data object I use is based on the 
following ENUM and toString method to emit as JSON. I've left out the parst 
that I use in other custom libraries that allow Java code to easily set up 
this information, and also to set this from a search response or a 
get-by-id response:

  public enum OpType
  {
CREATE,
INDEX,
DELETE
  }

  @Override
  public String toString()
  {
try
{
  XContentBuilder cb = jsonBuilder();
  cb.startObject();

  cb.field(opType.toString().toLowerCase());
  cb.startObject();

  cb.field(_index, index);
  cb.field(_type, type);
  if (id != null)
cb.field(_id, id);

  if (version  0)
  {
cb.field(_version, version);
if (versionType == VersionType.EXTERNAL)
  cb.field(_version_type, external);
  }

  if (ttl != null)
cb.field(_ttl, ttl);

  cb.endObject();

  cb.endObject();
  return cb.string();
}
catch (IOException e)
{
  return (null);
}

  }

  /* Operation type (action): create or index or delete */
  private OpType  opType  = OpType.INDEX;

  /* Metadata that this object supports */
  private String  index   = null;
  private String  type= null;
  private String  id  = null;
  private longversion = 0;
  private VersionType versionType = VersionType.INTERNAL;
  private TimeValue   ttl = null;

And the actual data line that would follow is similarly constructed using 
the content builder.

I wish I could help you more.

Brian


On Wednesday, January 7, 2015 10:41:26 AM UTC-5, Gopimanikandan Sengodan 
wrote:

 Thank you brian.  Let me change it accodingly as per your suggestion.  
 Could it possible to share the bulk load client and csv to json converter?


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/9d46f746-04c6-48fe-93bc-a0c8612539ca%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Incompatible encoding when using Logstash to ship JSON files to Elasticsearch

2014-12-10 Thread Brian
We use the HTTP protocol from logstash to send to Elasticsearch, and 
therefore we have never had this issue. 

There is a version of ES bundled with logstash, and if it doesn't match the 
version of ES you are using to store the logs then you may see problems if 
you don't use the HTTP protocol.

Brian

On Wednesday, December 10, 2014 3:53:30 PM UTC-5, Vagif Abilov wrote:

 Thank you Aaron, done. I've created an issue. But I'd like to find out if 
 there's a workaround for this problem. What's really strange that the same 
 Logstash installation works with similar JSON files on other machines.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/a86923e8-8e9f-429e-b85e-8ab8f7ab20d2%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Query doesn't find results

2014-12-05 Thread Brian
I believe that the query_string has its own syntax that assumes 
tokenization and other preprocessing.

Maybe if you added name: to the query? But I am not sure how you would tell 
the query string that your phrase is really one token.

But, thanks for giving me one more reason to avoid Spring!

Brian

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/4c6b4f8c-4600-435b-93f1-db62c807cf4f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Cannot find elasticsearch sample data in kibana4 beta 2

2014-12-03 Thread Brian Olson
I'm having some difficulties getting some non-logstash data to show up in 
kibana4. All logstash data works fine. I loaded up the french data as 
suggested on the elasticsearch help page 
(http://www.elasticsearch.org/help) and everything works as far as 
elasticserach is concerned. I can successfully load, map, and query the 
data from the CLI. In Kibana4, I can add the index and it reads all of the 
fields. The time-field-name gives 3 options as expected (matches the 
mapping). I chose date_creation (2012-06-21 05:46:59) and then search 
for the data in Kibana with no success. 

Just for kicks I loaded up kibana3 and I'm able to see the data with no 
date filtering. I changed the time_picker field to date_creation and 
again search for the data in 2012. Notta. Once I select a timeframe the 
data no longer appears. 

Doest this seem like an elasticsearch mapping issue or something in Kibana. 
Thanks in advance for any thoughts you may have. 

SOFTWARE
elasticsearch 1.4.0 Beta 1, Build 3998
kibana 4.0.0 Beta 2

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/5ae84f60-ac79-46c1-af93-9c57af594c63%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Cannot find elasticsearch sample data in kibana4 beta 2

2014-12-03 Thread Brian Olson
Upgraded to elasticsearch 1.4.1 - no change

On Wednesday, December 3, 2014 12:53:42 PM UTC-5, Brian Olson wrote:

 I'm having some difficulties getting some non-logstash data to show up in 
 kibana4. All logstash data works fine. I loaded up the french data as 
 suggested on the elasticsearch help page (
 http://www.elasticsearch.org/help) and everything works as far as 
 elasticserach is concerned. I can successfully load, map, and query the 
 data from the CLI. In Kibana4, I can add the index and it reads all of the 
 fields. The time-field-name gives 3 options as expected (matches the 
 mapping). I chose date_creation (2012-06-21 05:46:59) and then search 
 for the data in Kibana with no success. 

 Just for kicks I loaded up kibana3 and I'm able to see the data with no 
 date filtering. I changed the time_picker field to date_creation and 
 again search for the data in 2012. Notta. Once I select a timeframe the 
 data no longer appears. 

 Doest this seem like an elasticsearch mapping issue or something in 
 Kibana. Thanks in advance for any thoughts you may have. 

 SOFTWARE
 elasticsearch 1.4.0 Beta 1, Build 3998
 kibana 4.0.0 Beta 2



-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/4d83b780-bca3-4e8f-ba74-e7e96370cf27%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: tweezer fixes to status-red don't work, may need sledgehammer

2014-11-24 Thread Brian
Just a wild guess, but it seems that the /etc/init.d/elasticsearch restart 
command will, if properly named, stop a currently running instance and then 
start it.

If you issue the curl _shutdown command and then the restart command 
directly after without any delays, then perhaps that double blow from your 
sledgehammer is causing some corruption.

In general, it's not good to mix HTTP REST (curl) commands and scripts that 
directly handle processes without adequate delays to ensure they aren't 
hammering on each other.

Brian

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/d2679372-8184-4578-a9f6-a88dfe31ad42%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Spark StreamingContext read streaming updates from ES

2014-11-19 Thread Brian Walsh
Existing Spark support allows us to read or write to ES.

Read support is to one shot, that is, read what ES has in it's index now.

I'd like to have a Spark thread read streaming updates from ES, using it as 
a source not a sink.

I was wondering if there was a way to write a spark StreamingContext that 
will observe updates to ES?

Something like

ssc.elasticSearchStream(...)

Thanks for your time.

-b

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/3c4b5844-2648-40ee-8769-cd9c40456947%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Question about Logstash Joining ES Cluster and Index

2014-11-14 Thread Brian
I highly recommend that you use the HTTP output. Works great, is immune to 
the ES version, and there are no performance issues that I've seen. It Just 
Works. 

For example, here's my sample logstash configuration's output settings:

output {
  # Uncomment for testing only:
  # stdout { codec = rubydebug }

  # Elasticsearch
  elasticsearch {
 # Specify http (with or without quotes around http) to direct the
 # output as JSON documents via the Elasticsearch HTTP REST API
 protocol = http
 codec = json
 manage_template = false

 # Or whatever target ES host is required
 host = localhost

 # Or whatever _type is desired:
 index_type = sample
  }
}

As you can probably surmise, I have my own default index creation template 
so there's no need to splatter it all over creation; logstash runs better 
on the host on which it's gathering the log files and I vastly prefer one 
central index template than keeping a bazillion logstash configurations in 
perfect sync. And if we happen replace logstash for something else, then I 
still have my index creation templates.

Hope this helps!

Brian

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/27854489-1f4d-4ebd-883c-64dc6235eed4%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Import java transportClient only

2014-11-14 Thread Brian
Filip,

Or, just put all of the Elasticsearch jars on your local client system, 
then add their containing directory (with /* appended to it) to your 
-classpath, and your client can use the TransportClient. Java will pull in 
exactly what it needs and nothing it doesn't. And your client code stays 
tiny. Works great for us!

Brian

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/2fe6afa7-6ff0-4ed6-8792-1909f3ecc3d2%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Import java transportClient only

2014-11-14 Thread Brian
David,

On each machine on which either ES or a client is deployed, we have the 
following directory which contains all of the jars that are packaged with 
ES:

/opt/db/current/elasticsearch-1.3.4/lib
Then the java command's -classpath includes 
/opt/db/current/elasticsearch-1.3.4/lib/* (along with our own custom jars 
via /opt/db/lib/*) and everything works fine.

As for additional 3rd party jars, I have the following:

1. Jackson. The full library is used instead of the one inside ES.
2. Netty. This was needed for my own REST API which hides ES and contains 
the business logic. I couldn't figure out how to easily use the shaded 
version inside ES, and the real Netty is as easy to use as falling off a 
log.
3. The LMAX Disruptor .jar file. This thing combines nicely with Netty and 
wow! Netty and application thread counts remain low even under heavy loads.

Everything else I get directly from ES. And I love the way it shades its 
versions of Netty and Jackson so it's very easy for my own app to cherry 
pick what it wants from ES and what it prefers outside of ES.

We could use maven, I suppose, but we don't. Instead, we package all of the 
jars into a zip archive after our application is built against a specific 
ES version. And then that single self-contained zip archive is installed 
where it is needed. And there is no need for an external or internal maven 
repo. Not a big deal for us.

All in all, it's much like how Elasticsearch itself is packaged and 
distributed: A zip archive that I download from the web site. I would never 
use a .deb or .rpm since the version that I want is always on the web site. 
And I believe there is a maven repo but the .zip archive links are right on 
the web site, and we don't update all that often (regularly, but I don't 
thrash our deployment folks).

It sounds complicated, I suppose. But that was only once, and it's been 
easy to manage and develop against, easy to deploy, and makes me look very, 
very good to our deployment folks.

Brian

P.S. I don't use Guice or Spring. I don't see any problem with the new 
operator, and the services I create are fast, rock-solid, easy to configure 
and deploy, and that puts me light-years ahead of much of the pack. But 
this is another topic altogether! :-)

On Saturday, November 15, 2014 12:24:28 AM UTC-5, David Pilato wrote:

 Hi Brian,

 I think I'm missing something.
 At the end you still have the full elasticsearch jars, right?
 What is the difference with having that as a maven dependency?

 Is it a way for not getting all elasticsearch dependencies which are 
 shaded in elasticsearch jar such as Jackson, Guice,... ?

 David


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/01454216-da84-49c0-85e6-62efe4ad535d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: mapping store:true vs store:false

2014-11-14 Thread Brian
Especially when feeding log data via logstash, I have never used store:true 
and have found no need to specify it at all. The logstash JSON will be 
stored as the _source and retrieved by the query so there is no need to use 
store at all.

Anyway, that's my experience.

Brian

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/46a13fd5-5469-4aab-accd-60a33b27a096%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Question on stemming + synonyms and tokenizerFactory

2014-11-14 Thread Brian
Once you have your mapping set up, then create an application that itself 
constructs the analyzer you need. Then feed it your real words and let it 
generate the stemmed versions.

I don't think that ES can be told to do this; but it provides the classes 
you need to do it yourself.

For my own synonym processing, I do a Very Bad Thing. I create a synonym 
_type and then each document contains a list of words or phrases that are 
synonyms of each other. For a synonym query, I first query my synonym type. 
Then I OR the queries for each of the matching synonym words or phrases.

This is also much easier to maintain: I can update the synonyms on the fly 
and do not need to reindex the data at all. Not at all.

But it requires additional code, and it works best using the Java API. And 
some folks have indicated there are serious performance issues making this 
a Bad Solution. But I have not seen any problems with performance.

Oh, and all my words and phrases can be fully spelled out; it's only when 
they are used in the subsequent query do they get analyzed (tokenized, 
stemmed, and whatever else).

Brian

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/e5a984d2-4f30-4e78-b1ba-1dc27febdfd3%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Disabling dynamic mapping

2014-11-10 Thread Brian
This is what I put into the elasticsearch.yml file when I start 
Elasticsearch for use in a non-ELK environment:

# Do not automatically create an index when a document is loaded, and do
# not automatically index unknown (unmapped) fields:

action.auto_create_index: false
index.mapper.dynamic: false

And here's a complete example of a curl input document that I use to create 
an index with the desired types in which I don't want new indices, new 
types, or new fields to be automatically created:

{
  settings : {
index : {
  number_of_shards : 1,
  analysis : {
char_filter : { },
filter : {
  english_snowball_filter : {
type : snowball,
language : English
  }
},
analyzer : {
  english_standard_analyzer : {
type : custom,
tokenizer : standard,
filter : [ standard, lowercase, asciifolding ]
  },
  english_stemming_analyzer : {
type : custom,
tokenizer : standard,
filter : [ standard, lowercase, asciifolding, 
english_snowball_filter ]
  }
}
  }
}
  },
  mappings : {
_default_ : {
  dynamic : strict
},
person : {
  _all : {
enabled : false
  },
  properties : {
telno : {
  type : string,
  analyzer : english_standard_analyzer
},
gn : {
  type : string,
  analyzer : english_standard_analyzer
},
sn : {
  type : string,
  analyzer : english_stemming_analyzer
},
o : {
  type : string,
  analyzer : english_stemming_analyzer
}
  }
}
  }
}

By the way, I never mix indices that are used for more standard database 
queries with the indices used by the ELK stack. Those are two separate 
Elasticsearch clusters entirely; the former is locked down as shown above, 
while the latter is left in its default free form method of automatically 
creating indices and new fields on the fly, just as Splunk and ELK and 
other log analysis tools do.

I hope this helps.

Brian

On Monday, November 10, 2014 10:45:38 AM UTC-5, pulkitsinghal wrote:

 What does the json in the CURL request for this look like?

 The dynamic creation of mappings for unmapped types can be completely 
 disabled by setting *index.mapper.dynamic* to false.

 http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/mapping-dynamic-mapping.html#mapping-dynamic-mapping

 Thanks!
 - Pulkit


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/9b257ef5-3b87-43fe-a64b-1114da64d671%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: ES 1.3.4 scrolling never ends

2014-11-10 Thread Brian
A while back, I wrote my own post-query response sorting so that I could 
handle cases that Elasticsearch didn't. One case was sorting a scan query. 
I used a Java TreeSet class and could also limit it to the top 'N' 
(configurable) items. It is very, very quick, pretty much adding no 
overhead to the existing scan logic. And it supports an arbitrarily complex 
compound sort key, much like an SQL ORDERBY statement; it's very easy to 
construct.

Probably not useful for a normal user query, but it is very useful for an 
ad-hoc query in which I wish to scan across an indeterminately large result 
set but still sort the results. 

One of these days, it might make a good plug-in candidate. But I am not 
sure how to integrate it with the scan API, so for now it's just part of 
the Java client layer.

Brian

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/74e311f5-ae54-4da1-9369-567e7bf03272%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: ES cluster become red

2014-11-10 Thread Brian
Moshe,

Exactly!

What you might wish to do is add a Wait for Yellow query before doing any 
queries, or a Wait for Green request before doing any updates. That way, 
you can deterministically wait for the appropriate status before continuing.

For example: Loop on the following until it succeeds, some timeout expires 
after repeatedly catching NoNodeAvailableException, or else some other 
serious exception is thrown:

client.admin().cluster().prepareHealth().setTimeout(timeout)
.setWaitForYellowStatus().execute().actionGet();
Hope this helps!

Brian

On Sunday, November 9, 2014 8:22:58 AM UTC-5, Moshe Recanati wrote:

 Update
 After couple of seconds or minutes the cluster became green.
 I assume this is after ES stabilized with data,




-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/ecd2f91b-c322-4df8-b411-d47f59f356a4%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: get list of all the fields in indextype of index using java api

2014-11-06 Thread Brian
I do this by getting the mappings for a specific index, then isolating by 
type if desired. This takes care of all explicitly mapped fields, and also 
any automatically detected and mapped fields.

Especially in the latter case, it's a good way to check and see if 
Elasticsearch is guessing your automatically mapped fields the way you 
expect.

Brian

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/a1211d2e-530a-4b1a-8b8f-e81cfaf4293f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: ES 1.3.4 scrolling never ends

2014-11-05 Thread Brian
You need to get the scroll ID from each response and use that one in the 
subsequent scan search. You cannot simply reuse the same scroll ID.

Brian

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/d1f23ca4-13e6-4d1e-ad01-2cbda2810c94%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


incomplete results?

2014-11-04 Thread brian
hi

elasticsearch noob here, so forgive me if i get terminology wrong etc.

basically i'm loading a bunch of documents into an elasticsearch index, 
currently sitting at 17,000. one of the fields i'm using is called md5 
and its not_analyzed in the mapping (also tried without to solve this 
problem, to no avail)

when looking at the data in kibana, i've added a panel, looking for the 
topN md5s and can see they have various values. however when i select one 
of those (using the search icon), the number is actually higher than what 
was originally displayed in the panel (it was 10, selecting the top md5 
actually shows 15..). I've tired copying the exact queries from the 
'inspect' options (showing all and the individual md5) and running that 
against elasticsearch using curl and the same results show up

I've tried renaming the md5 field to md5_hash and the same problem occurs. 
i would appreciate any insight as to what may be happening here as i've 
tried everything i can think of ..

- brian

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/4369cec8-4b8c-48ec-8d0a-a59e62e20c1c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Differences about label your fields with or without @ in Kibana

2014-10-29 Thread Brian
The @timestamp field, created by logstash by default, has always worked 
perfectly out-of-the-box with Kibana's time picker and also with curator. 
Perhaps if you posted one document from your Elasticsearch response it 
might help.

But I don't recommend that you create your own fields with @ as a prefix 
character. Straying a bit from your question, I created some R scripts to 
analyze and plot things in a way that neither Kibana nor Splunk can. What 
I've noticed is that when I export as CSV, either from Elasticsearch or 
from Splunk, and then import into R's CSV reader, I notice that:

1. Elasticsearch's @timestamp field becomes the X.timestamp field in R.

2. Splunk's _time field becomes the X_time field in R.

Which is one very good reason not to add a @ or _ to the front of your own 
fields. It's a lot of extra hard-coded processing to figure out the source 
and then choose the field using R when it's not the same name as the field 
from Elasticsearch.

But I digress.

Brian

On Wednesday, October 29, 2014 1:20:10 PM UTC-4, Iván Fernández Perea wrote:

 I was using Kibana and wondering which are the differences between using 
 or not  an @ sign before field names. It seems that the default (as in 
 timepicker in the dashboard settings) is using the @ before a field but it 
 doesn't seem to work in my case. I need to set the Time Field in the 
 Timepicker with a field name and no @ before it to make it work.

 Thank you,
 Iván.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/9897dd1d-9306-4f73-bcbd-fba65c5f4d8e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Need help to create array type field in elastic search

2014-10-14 Thread Brian
There is nothing special you need to add to your mapping to enable multiple 
values for a field. Just pass in an array of values instead of a single 
value, and all of the values are analyzed.

One thing you might want to add for string fields with multiple values:

position_offset_gap : *n*

When a string field is analyzed, it typically assigns a position to each 
token that is one greater than the position of the previous token. By 
setting a position offset gap value to *n*, it skips ahead that many 
positions, representing the number of non-matching word positions between 
consecutive values.

What this does is that if your field contains multiple values that each has 
multiple words, a phrase query won't span across values unless the slop 
value is large enough (n or larger, I seem to recall).

Hope this helps.

Brian



-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/e2d2dd14-d899-4e1e-a909-ce9e305f900a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: snapshot/restore

2014-10-07 Thread Brian
It looks like it was first introduced in:

1.0.0.Beta2 http://www.elasticsearch.org/downloads/1-0-0-beta2/

December 2, 2013




   - Snapshot/Restore API – Phase 1 #3826 
   http://github.com/elasticsearch/elasticsearch/issues/issue/3826



This preceded 0.90.9, so I would suspect that it's in your 0.90.10 version 
as well.

Brian

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/acf61715-39aa-40c8-ba63-3db4f17667c9%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Logstash into Elasticsearch Mapping Issues

2014-10-06 Thread Brian
I haven't ever let logstash set the default mappings. Instead, whenever a 
logstash-style index is created, I let Elasticsearch set the default 
mappings from its template. That way, it works even if I replace logstash 
with something else.

For example, with my $ES_CONFIG/templates/automap.json file is the following

{
  automap : {
template : logstash-*,
settings : {
  index.mapping.ignore_malformed : true
},
mappings : {
  _default_ : {
numeric_detection : true,
_all : { enabled : false },
properties : {
  message : { type : string },
  host : { type : string },
  UUID : {  type : string, index : not_analyzed },
  logdate : {  type : string, index : no }
}
  }
}
  }
}

And since logstash stores the entire message within the message field and I 
never modify that particular field, the _all field is disabled and 
Elasticsearch is told to use the message field as the default within a 
Kibana query via the following Java option when starting Elasticsearch as 
part of the ELK stack:

-Des.index.query.default_field=message

I hope this helps!

Brian

On Thursday, October 2, 2014 9:02:17 PM UTC-4, elo...@gmail.com wrote:

 Anyone have an idea what to do in a situation where I am using the output 
 function in logstash to send it to an Elasticsearch cluster via protocol 
 http and using a JSON templateand the mappings in the JSON template 
 aren't being used in the elasticsearch cluster. 

 logstash.conf

 input {
 tcp {
 port = 5170
 type = sourcefire
 }
 }

 filter {

 mutate{
 split = [message, |]
 add_field = {
 event = %{message[5]}
 eventSource = %{message[1]}
 }
 }

 kv {
 include_keys = [dhost, dst, dpt, shost, src, spt, rt]
 }

 mutate {
 rename = [ dhost, destinationHost ]
 rename = [ dst, destinationAddress ]
 rename = [ dpt, destinationPort ]
 rename = [ shost, sourceHost ]
 rename = [ src, sourceAddress ]
 rename = [ spt, sourcePort ]
 }

 date {
 match = [rt,UNIX_MS]
 target = eventDate
 }

 geoip {
 add_tag = [ sourceGeo ]
 source = src
 database = /opt/logstash/vendor/geoip/GeoLiteCity.dat
 }

 geoip {
 add_tag = [ destinationGeo ]
 source = src
 database = /opt/logstash/vendor/geoip/GeoLiteCity.dat
 }
 }

 output {
 if [type] == sourcefire {
 elasticsearch {
 cluster = XXX-cluster
 flush_size = 1
 manage_template = true
 template = 
 /opt/logstash/lib/logstash/outputs/elasticsearch/elasticsearch-sourcefire.json
 }
 }
 }

 JSON Template

 {
 template: logstash-*,
 settings: {
 index.refresh_interval: 5s
 },
 mappings: {
 Sourcefire: {
 _all: {
 enabled: true
 },
 properties: {
 @timestamp: {
 type: date,
 format: basicDateTimeNoMillis
 },
 @version: {
 type: string,
 index: not_analyzed
 },
 geoip: {
 type: object,
 dynamic: true,
 path: full,
 properties: {
 location: {
 type: geo_point
 }
 }
 },
 event: {
 type: string,
 index: not_analyzed
 },
 eventDate: {
 type: date,
 format: basicDateTimeNoMillis
 },
 destinationAddress: {
 type: ip
 },
 destinationHost: {
 type: string,
 index: not_analyzed
 },
 destinationPort: {
 type: integer,
 index: not_analyzed
 },
 sourceAddress: {
 type: ip
 },
 sourceHost: {
 type: string,
 index: not_analyzed
 },
 sourcePort: {
 type: integer,
 index: not_analyzed
 }
 }
 }
 }
 }




-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/ed3eba42-7142-4b9a-8334-8463f519c9bc%40googlegroups.com.
For more options, visit https

Re: Logstash into Elasticsearch Mapping Issues

2014-10-06 Thread Brian
I also have the following Logstash output configuration:

output {
  # For testing only
  stdout { codec = rubydebug }

  # Elasticsearch via HTTP REST
  elasticsearch {
 protocol = http
 codec = json
 manage_template = false

 # Or whatever target ES host is required:
 host = localhost

 # Or whatever _type is desired: Usually the environment name
 # e.g. qa, devtest, prod, and so on:
 index_type = sample
  }
}

Brian

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/e05143d2-a2fd-4365-932b-b4603b08165c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Kibana 3.1.1

2014-10-03 Thread Brian
Link went away (404); now it's back but still no release notes...

On Thursday, October 2, 2014 11:05:16 AM UTC-4, Brian wrote:

 Looks interesting. But no release notes?

 http://www.elasticsearch.org/downloads/kibana-3-1-1/

 Brian


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/c2ecba78-0ff6-48c0-bd97-1b09c34f899f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Kibana 3.1.1

2014-10-02 Thread Brian
Looks interesting. But no release notes?

http://www.elasticsearch.org/downloads/kibana-3-1-1/

Brian

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/7980fb38-ce7c-44f3-9630-182184d76f08%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: search for available fields

2014-10-02 Thread Brian
In that case, your strategy seems fine. ES has already done all the real 
work of creating the responses, and I would expect that iterating across 
them and gathering the fields into a Set should be rather quick.

However, you still might wish to get the mappings for the index. Why? 
Because once you've collected the subset of fields within your current 
response, you still can only search on the ones that are indexed. So for a 
general solution, you would perhaps want to skip over fields that are 
stored in the documents but not indexed.

Brian

On Tuesday, September 30, 2014 6:01:20 PM UTC-4, shooali wrote:

 Thank you Brian,

 However, I am looking for something slightly different. I don't want to 
 know all the fields for an index, I want to know for a certain subset of 
 the documents I have indexed, what are the relevant fields that I can 
 continue search on. For example:
 If I have 5 fields total for my index, Then I search for all document that 
 satisfy certain criteria, for example, all documents that FieldA equals 
 '5'. Only for those, what are the available fields that I can continue on 
 search on... 
 The way I thought I can do this is to go over all result of the first 
 search and collect all fields of these documents into a Set and use those. 
 My question is whether there is a better/more performant way to achieve 
 this goal.



-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/4ae1168c-2526-4ba6-937a-1f2b1bc90a0b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Pagination in elasticsearch

2014-10-01 Thread Brian
The setSize method specifies the maximum number of responses per shard.

So if you query an index with *N* shards and your query sets a size of *S* then 
the query will return a maximum of up to *N* x *S* response hits.

Since you writing this in Java, it's a relatively easy matter to make 
further adjustments on the response, limiting the response to the page size 
you expect.

Brian

On Thursday, September 25, 2014 6:56:44 PM UTC-4, Malini wrote:

 I have

 SearchRequestBuilder srb = client.prepareSearch(cs).setTypes(csdl);
 srb.setSearchType(SearchType.DFS_QUERY_AND_FETCH);

 QueryBuilder qb = (QueryBuilder) QueryBuilders.matchQuery(title, 
 searchText);

 FilterBuilder fb = FilterBuilders.andFilter (
 FilterBuilders.termsFilter(elasticdb, searchDB),
 //get from date and to date
 
 FilterBuilders.rangeFilter(pubdate).gte(1890-09).lte(2014-08)
 );
 
 FilteredQueryBuilder builder = QueryBuilders.filteredQuery(qb,fb);

 FunctionScoreQueryBuilder functionbuilder = new 
 FunctionScoreQueryBuilder(builder)
 
 .add(FilterBuilders.termsFilter(category, acm), factorFunction(-30.0f));
 
 srb.setQuery(functionbuilder).setFrom(0).setSize(1);

 SearchResponse response = srb.execute().actionGet();
 SearchHit[] results = response.getHits().getHits();

 Even though I set from=0 and size = 1 ( to see only one result) I see more 
 than 1 results.

 How do we get this pagination working?

 Thanks in advance




-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/00c324bd-fad1-404c-8b28-4274ffe11c24%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Searching on _all vs bool query on a (large) number of fields

2014-10-01 Thread Brian
I would not say Diabolical. Perhaps not optimal based on Lucene's 
internal design.

But I do something similar with table-based synonyms. In other words, when 
matching a synonym of a word, I do not pre-build the database index with 
synonyms. Instead, I maintain a table (index/type) of words and their 
synonyms, query that table, retrieve the synonyms, and then create the 
second and final query that basically does an OR search across the word and 
its synonyms. (It's basically a group of should clauses, just like yours).

I find that performance is fine. And accuracy and usefulness is superior. 
For example, a user query for synonym of the wild-carded BIG* might find 
BIG, LARGE, HUGE and also BIGHORN, SHEEP. And so on; some of the synonym 
lists are rather long and with multiple words there are many should terms 
in the final query.

And even with the multiple queries (first to resolve the synonyms, and the 
second to OR across them), performance is remarkably fast. It might be 
pushing Lucene a little, but I like the improved accuracy, and the ability 
to easily and regularly modify my synonym lists without any need to rebuild 
the hundreds of millions of documents that I am querying.

So for your question, my suggestion is to go for it and it should perform 
well enough.

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/4d6ba249-c8b6-4870-af96-ed71ee1b2f7e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Checking for tampering of indices

2014-10-01 Thread Brian Wilkins
In Splunk, it is possible to detect tampering of logs. Splunk will take an 
event at ingestion time and create a hash value based on the event and your 
certificates/keys.  You can then write searches that will re-hash the event 
to be compared to the original to indicate if anything has changed.  We 
need something like that. 

How is that possible with elasticsearch? 

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/3b724745-88ac-4484-9d21-284ec28697a9%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Kibana server-size integration with R, Perl, and other tools

2014-10-01 Thread Brian
Lance,

Thanks for the clarification. Yeah, the consensus seems to be to either 
issue the same REST command off-line (not available to Windows PMs, since I 
am not going to touch Windows with a pole shorter than 25m :-), or to write 
a server plug-in (would allow even Windows users to invoke the scripts).

But one question: When I click on the Info button near the upper right of a 
panel, it shows the JSON request as invoked by curl. But that's only a 
suggestion, right? In other words, my browser is not using curl?

I've run into issues with curl's buffer limitations with large queries, and 
am hoping that Kibana is only giving me a suggestion to use curl, but isn't 
telling my browser to use curl.

Brian

On Friday, September 26, 2014 2:51:38 PM UTC-4, Lance A. Brown wrote:

 On 2014-09-25 11:57 am, Brian wrote: 
  And as my part of the bargain, I will use Perl, R, or whatever else is 
  at my disposal to create custom commands that can run on the Kibana 
  host and perform all of the analysis that our group needs. 

 Something to remember: The Kibana host is your browser.  The current 
 version of Kibana run entirely within the browser, making calls to 
 Elasticsearch for data, processing it and generating graphs all within 
 the browser.  There is no server-side operating component, just static 
 files that get loaded into your browser. 


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/9dbdb62e-7473-4d71-81f8-3ed27e90c2fe%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: elasticsearch blocked futex

2014-09-30 Thread Brian
Chris,

This sounds very suspiciously like a problem we had. We set up an 
experimental local ELK server (one node in the cluster) and fed it with 
logstash. I was manually cleaning up older data using the Elasticsearch 
Head plug-in, but over one weekend the cluster got into a funky state. The 
curl API said it was Yellow, but ES Head showed Green, and queries were 
hanging.

This was a VM that was dedicated to ES with 1TB disk space (only about 2% 
was ever used at any point in time), 4 CPUs, and 24GB RAM (though the Java 
JVM was not tuned to take advantage of all of this memory). Kibana was 
hosted as a site plug-in, but its usage was very light. Though I had been 
playing around with increasing the size limit of responses way past the 
default of 500, and I'm sure the ES server bore the brunt of that.

I stopped and restarted ES and everything went back to normal.

I installed Curator to clean up older indices automatically, and the 
problem has never returned. (I have also stopped telling Kibana to ask for 
up to 5 response documents on a query!)

I suspect you're getting some sort of OOM condition and that's when things 
start looking odd.

Anyway, OOM is just a wild guess. I wouldn't have mentioned something so 
nebulous, but the symptoms you have are strikingly close to the ones we saw.

Brian

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/ccf8ab3d-c89c-42da-95ba-1b25198fc445%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: search for available fields

2014-09-30 Thread Brian
The following query will return all of the mappings for all of the indices 
on the specified ES host:

$ curl -XGET 'http://*hostname*:9200/_all/_mapping?pretty=true'  echo

You can read the JSON, or else parse it and extract the details you need. 
For example, if you have automatic mapping enabled, this is a very good way 
to not only discover the searchable fields but also see if they are strings 
or numbers.

Brian

On Tuesday, September 30, 2014 11:15:32 AM UTC-4, shooali wrote:

 Hi,

 What is the most efficient way to get all available fields to search on 
 for a preliminary search criteria?

 Thanks,

 Shooali 


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/95cd8e08-bba6-451c-96c5-8e3c6a1e1fff%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: How does logstash chose which timestamped index to use?

2014-09-30 Thread Brian
Matt,

Assuming your logstash configurations correctly set the @timestamp field, 
then logstash will store the document in the day that is specified by the 
@timestamp field.

I have verified this behavior by observation over the time we have been 
using the ELK stack.

For example, we have a Perl CGI script that is used to emulate a customer 
service. It has a hard-coded ISO-8601 date string which our logstash 
configuration finds before it notices the syslog date. And so that log 
entry ends up in the day in the past that the hard-coded string specifies. 
And then curator cleans it up each and every day.

Bottom line: logstash already respects the day in the @timestamp when 
storing data in ES.

Brian

On Tuesday, September 30, 2014 2:31:59 PM UTC-4, Matt Hughes wrote:



 I have a logstash-forwarder client sending events to lumberjack - 
 elasticsearch to timestamped logstash indices.  How does logstash decide 
 what *day* index to put the document in.  Does it look at @timestamp?  
 @timestamp is just generated when the document is received, correct?  So if 
 you logged an event on a client at 11 pm UTC but it didn't make it to 
 elasticsearch until 1am UTC the next day, which index would it go in?  
 Would it go in the day it was created or would it go in the day it got to 
 elasticsearch?  

 If the latter, is there a way to force logstash to respect a date field in 
 the original log event?


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/3931b0d7-6923-4dce-a524-33b49d04af01%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Kibana server-size integration with R, Perl, and other tools

2014-09-29 Thread Brian
Thanks, Jörg. I will need to find some time to look into this, as it seems 
exactly like what I was looking for.

Thanks again!

Brian

On Monday, September 29, 2014 12:21:00 PM UTC-4, Jörg Prante wrote:

 It is quite easy to add a wrapper as a plugin in ES in the REST output 
 routine around search responses, see
 https://github.com/jprante/elasticsearch-arrayformat

 or

 https://github.com/jprante/elasticsearch-csv

 If the CSV plugin has deficiencies, I would like to get feedback what is 
 missing/what can be added. With a bit of hacking, it is possible to write 
 ES plugin(s) that can trigger the creation of graphviz, gnuplot, R etc. 
 plots instead of delivering temporary CSV files.

 Jörg



-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/02166e89-b8ad-4778-800a-77e6d01dc8ac%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Kibana server-size integration with R, Perl, and other tools

2014-09-25 Thread Brian
Ash,

JSON is a natural for Kibana's Javascript to read and therefore emit as 
CSV. So what I really was asking is Kibana going to become a serious 
conteder and allow user-written commands to be inserted into the pipeline 
between data query/response and charting. After my few weeks with R, I have 
gotten it to far exceed GNUPlot for plotting (even with the base plotting 
functions; I haven't yet dived into ggplot2 package), and to also far 
exceeds Kibana. For example, setting up a custom dashboard is tedious, and 
it's not easily customizable.

Now, I am not suggesting that the ELK stack turn into Splunk directly. But 
since it wants to become a serious contender, I an am strongly recommending 
that the ELK team take the next step and allow a user-written command to be 
run against the Kibana output and its charting. And I recommend that the 
output be CSV because that's what R supports so naturally. And with R, I 
can build out custom analysis scripts that are flexible (and not hard-coded 
like Kibana dashboards).

For example, I have an R script that gives me the most-commonly used 
functions that the Splunk timechart command offers. And with all of its 
customizability: Selecting the fields to use as the analysis, the by field 
(for example, plotting response time by host name), the statistics (mean, 
max, 95th percentile, and so on), even splitting the colors so that the 
plot instantly shows the distribution of load across 10 hosts that reside 
within two data centers.

This is an excellent (and free) book that shows what Splunk can do by way 
of clear examples:

http://www.splunk.com/goto/book

Again, I don't suggest that Kibana duplicate this. But I strongly suggest 
that Kibana gives me a way to insert my own commands into the processing so 
that I can implement the specific functions that our group requires, and 
can do it without my gorpy Perl script and copy-paste command mumbo-jumbo, 
and instead in a much more friendly and accessible way that even the PMs 
can run from their Windows laptops without touching the command line.

And as my part of the bargain, I will use Perl, R, or whatever else is at 
my disposal to create custom commands that can run on the Kibana host and 
perform all of the analysis that our group needs.

Brian

On Wednesday, September 24, 2014 4:34:43 PM UTC-4, Ashit Kumar wrote:

 Brian,

 I like the direction you are going down and am trying to do that myself. 
 However, being a perl fledgling, I am still battling Dumper etc. I would 
 appreciate it if you could share your code to convert and ES query to CSV. 
 I want to use aggregations and print/report/graph results. Kibana is very 
 pretty and does the basics well, but I want to know who used web mail and 
 order it by volume of data sent by hour of day and either graph / tabulate 
 / csv out the result. I just cant see how to do that with Kibana.

 Thanks

 Ash



-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/b04ee873-23a6-40dd-a91b-7fa304634715%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


How to fix IndexMissingException

2014-09-09 Thread Brian Wilkins
I recently ran into an issue where my cluster is reporting an 
IndexMissingException. I tried deleting the faulty index, but I keep 
getting the same error returned. How do I fix this problem?

$ curl -XDELETE 'http://localhost:9200/logstash-2014.09.04.11'

{error:IndexMissingException[[logstash-2014.09.04.11] 
missing],status:404}

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/8b6cd6fb-14b9-4775-9750-7352c4c1369e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Using elasticSearch as repository for UDP published Ceilometer data through logstash and exception is being thrown 'invalid version format'

2014-09-02 Thread Brian Callanan
Bump... Anyone???


On Friday, August 29, 2014 11:28:01 AM UTC-4, Brian Callanan wrote:


 Hi, Need a little help. I'm Using Openstack Ceilometer and I've configured 
 it to push metered data over UDP to a host:port. I installed logstash and 
 configured it to receive the the UDP data from Ceilometer using the codec: 
 msgpack.
 This works great! Really! Now I'm trying to Stuff the data on output into 
 ElasticSearch and its getting an exception when pushing data into 
 elasticsearch. Pushed data throws the following from elastic search:

 [2014-08-29 11:05:08,646][WARN ][http.netty   ] [Amphibian] 
 Caught exception while handling client http traffic, closing connection 
 [id: 0x7d45e4d7, /127.0.0.1:53745 = /127.0.0.1:9200]
 java.lang.IllegalArgumentException: invalid version format: 
 LOGSTASH-LINUX-CAL-13046-2010L9O160SXTFILI-RJ6DDVLG LINUX-CAL   
 10.2.3.23
 at 
 org.elasticsearch.common.netty.handler.codec.http.HttpVersion.init(HttpVersion.java:102)
 at 
 org.elasticsearch.common.netty.handler.codec.http.HttpVersion.valueOf(HttpVersion.java:62)
 at 
 org.elasticsearch.common.netty.handler.codec.http.HttpRequestDecoder.createMessage(HttpRequestDecoder.java:75)
 at 
 org.elasticsearch.common.netty.handler.codec.http.HttpMessageDecoder.decode(HttpMessageDecoder.java:189)
 at 
 org.elasticsearch.common.netty.handler.codec.http.HttpMessageDecoder.decode(HttpMessageDecoder.java:101)
 at 
 org.elasticsearch.common.netty.handler.codec.replay.ReplayingDecoder.callDecode(ReplayingDecoder.java:500)
 at 
 org.elasticsearch.common.netty.handler.codec.replay.ReplayingDecoder.messageReceived(ReplayingDecoder.java:435)
   ...

 *Can anyone shed any light on why the exception is being thrown?*

 My elastic search version:

 brian.callanan@linux-cal 143 % ./elasticsearch -v
 Version: 1.3.2, Build: dee175d/2014-08-13T14:29:30Z, JVM: 1.7.0_40

 My Logstash version

 brian.callanan@linux-cal 159 % logstash -V
 logstash 1.4.2

 My logstash conf

 input {
   udp {
 codec = msgpack # codec (optional), default: plain
 port = 40001 # number (required)
 type = ceilometer # string (optional)
   }
 }
 output {
   elasticsearch {
   host = localhost
   port = 9200
   codec = json
   }
   stdout { codec = rubydebug }
 }

  A sample data:
 {
  counter_name = network.incoming.bytes.rate,
   resource_id = 
 instance-0017-bec82aeb-b06a-4569-8b91-fcb6acd491e0-tap06349b1b-2d,
 timestamp = 2014-08-29T13:49:12Z,
counter_volume = 8285.,
   user_id = cbf803c4aeb6415eb492c04ed8debe2c,
 message_signature = 
 e96ade5e06e1ec903e459f4c8a383413d1058bda0c1f7546dea62800e5f289f8,
 resource_metadata = {
  name = tap06349b1b-2d,
parameters = {},
  fref = nil,
   instance_id = bec82aeb-b06a-4569-8b91-fcb6acd491e0,
 instance_type = 3422a1d6-d61c-4577-9d38-47e1b25e8ad3,
   mac = fa:16:3e:a5:82:09
 },
source = openstack,
  counter_unit = B/s,
project_id = e7a434ef0aa549c9824d963029a02454,
message_id = 4210ce68-2f83-11e4-9f59-f01fafe5cc22,
  counter_type = gauge,
  @version = 1,
@timestamp = 2014-08-29T13:49:12.410Z,
  tags = [],
  type = ceilometer,
  host = 10.2.24.7
 }



-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/3970e4c3-d7c1-4056-b5bc-636473558d5b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Multi-field collapsing

2014-08-29 Thread Brian Hudson
I have a use case which requires collapsing on multiple fields.

As a simple example assume I have some movie documents indexed with the 
fields: Director, Actor, Title  Release Date. I want to be able to 
collapse on Director and Actor, getting the most recent movie (as indicated 
by Release Date).

I think the new top hits aggregation almost gets me mostly what I need. I 
can create a terms aggregation on Director, with a sub terms aggregation on 
Actor, and add a top hits aggregation to that (size 1). Would this be the 
proper approach? By traversing over the aggregations I can get all of the 
hits that I want - however I can't (have elasticsearch) sort or page them.

It's almost like I'd need a hitCollector aggregation which would collect 
all search hits generated by it's sub aggregations and allow me to specify 
sort and paging information at that level. Thoughts?

Brian

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/318b7474-004f-4244-90e8-d9b93639481f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Kibana server-size integration with R, Perl, and other tools

2014-08-25 Thread Brian
Is there some existing method to integrate processing between the Kibana/ 
Elasticsearch response JSON and the graphing?

For example, I have a Perl script that can convert an Elasticsearch JSON 
response into a CSV, even reversing the response to put the oldest event 
first (for gnuplot compatibility). I then have an R script that can accept 
a CSV and perform custom statistical analysis from it. It can even 
auto-detect the timestamp and ordering and reverse the CSV events (adapting 
without change to either an Elasticsearch response as CSV, or a direct CSV 
export from Splunk).

I've showed the process to a few people, but all balk outright or else shy 
away politely at the thought of going to Kibana's Info button, copying and 
pasting the curl-based query, and then running it along with the Perl CSV 
conversion script and R processing script from the command line. And I 
can't blame them!

It may be that Kibana already has the capability to pipe data through 
server-installed commands and scripts, but my lack of Javascript experience 
and lack of Kibana internals expertise doesn't seem to help me discover it.

Or perhaps this would be a great new addition to Kibana:

1. Allow a server-side command to be in the middle of the response and the 
charting.
2. Deliver the response as a CSV with headers, including the @timestamp 
field of course, to the server-side command, along with the appropriate 
arguments and options for the particular panel.
3. Document the graphite / graphviz / other format required to display the 
plots.

Just a thought.

Brian

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/132cfc20-ea67-42c8-a518-48404593d35d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Terms Filter Assistance

2014-08-20 Thread Brian
We have 2 indices (logs  intel) and are trying to search 2 fields in the 
logs index (src  dst) for any match from the intel ip field. The challenge 
is the terms filter is expecting 1 document with all the values to be 
searched for within that document.  The intel index has over 150k documents.

Is there a way to extract the ip field from the intel index (aggregations 
maybe) and use that to search the src  dst fields in the logs index?

Here is the code I am trying to use:

curl -XGET localhost:9200/logs/_search -d '{
  query : {
filtered : {
  filter : {
terms : {
  src : {
index : intel,
type : ipaddress,
id : *,
path : ip
  },

  dst : {
index : intel,
type : ipaddress,
id : *,
path : ip
  },

}
  }
}
  }
}

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/b2d9d8c9-4747-4cb6-badc-4752345544dc%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: bulk thread pool rejections

2014-08-14 Thread Malia, Brian
I found out that the rejections on ES are retried by logstash after a short 
relay.  Increasing the queue by too much costs more memory in ES, which takes 
away from merges, searches, etc..

I increased threadpool.bulk.queue_size from 50 to 100, I see no lost messages 
due to the rejections.


From: Robert Gardam robert.gar...@fyber.commailto:robert.gar...@fyber.com
Reply-To: 
elasticsearch@googlegroups.commailto:elasticsearch@googlegroups.com 
elasticsearch@googlegroups.commailto:elasticsearch@googlegroups.com
Date: Thursday, August 14, 2014 at 5:55 AM
To: elasticsearch@googlegroups.commailto:elasticsearch@googlegroups.com 
elasticsearch@googlegroups.commailto:elasticsearch@googlegroups.com
Subject: Re: bulk thread pool rejections

Did you resolve this issue? I was seeing the exact thing in my setup. I also 
have my bulk messages set to 5k in logstash. Originally I had set the thread 
pool to unlimited but this apparently causes some strange issues with stability.


On Tuesday, April 8, 2014 5:00:32 PM UTC+2, shift wrote:
I tried lowering the logstash threads, but I am unable to keep up with the 
incoming message rate.  It is important that I index messages in real time, but 
equally important that I am not losing messages.  :)

To keep indexing real time I need 200 logstash output threads with a flush size 
of 5000 sending bulk messages to each node in the elasticsearch cluster, but I 
am concerned that I am losing messages with these rejections.

I increased the queue size to 500, I will see if this helps.


On Wednesday, April 2, 2014 11:34:43 AM UTC-4, Drew Raines wrote:
shift wrote:

 I am seeing a high number of rejections for the bulk thread pool
 on a 32 core system.  Should I leave the thread pool size fixed
 to the # of cores and the default queue size at 50?  Are these
 rejections re-processed?

 From my clients sending bulk documents (logstash), do I need to
 limit the number of connections to 32?  I currently have 200
 output threads to each elasticsearch node.

The rejections are telling you that ES's bulk thread pool is busy
and it can't enqueue any more to wait for an open thread.  They
aren't retried.  The exception your client gets is the final word
for that request.

Lower your logstash threads to 16 or 32, monitor rejections, and
gradually raise.  You could also increase the queue size, but keep
in mind that's only useful to handle spikes.  You probably don't
want to keep thousands around waiting since they take resources.

Drew


 bulk : {
   threads : 32,
  * queue : 50,*
   active : 32,
  * rejected : 12592108,*
   largest : 32,
   completed : 584407554
 }

 Thanks!  Any feedback is appreciated.

--
You received this message because you are subscribed to a topic in the Google 
Groups elasticsearch group.
To unsubscribe from this topic, visit 
https://groups.google.com/d/topic/elasticsearch/6oNFDzWZv98/unsubscribe.
To unsubscribe from this group and all its topics, send an email to 
elasticsearch+unsubscr...@googlegroups.commailto:elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/b84ceec3-145c-4129-9691-af1ad791aa57%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/b84ceec3-145c-4129-9691-af1ad791aa57%40googlegroups.com?utm_medium=emailutm_source=footer.
For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/0FCFDBE5A17E804A9FCD5500B8E1C53FB224D4E7%40atl1ex10mbx2.corp.etradegrp.com.
For more options, visit https://groups.google.com/d/optout.


Re: Some observations with Curator

2014-08-06 Thread Brian
Aaron,

Well, now I feel a little foolish. Perhaps it was from my initial attempt 
to put --logfile at the end of the command instead of before the action:

$ curator delete --older-than 8 --logfile /tmp/curator.log
usage: curator [-h] [-v] [--host HOST] [--url_prefix URL_PREFIX] [--port 
PORT]
   [--ssl] [--auth AUTH] [-t TIMEOUT] [--master-only] [-n] [-D]
   [--loglevel LOG_LEVEL] [-l LOG_FILE] [--logformat LOGFORMAT]
   {show,allocation,alias,snapshot,close,bloom,optimize,delete}
   ...
curator: error: unrecognized arguments: --logfile /tmp/curator.log

So I changed it to -l before I moved it, based on the error message above. 
But you're correct: It does accept both forms of the option:

# For testing: Works fine and stores the log in /tmp/curator.log

$ curator --logfile /tmp/curator.log delete --older-than 8
# Older CentOS server; it's 2.7.5 on my MacBook (Mavericks) and
# HP laptop (Ubuntu 14.04 LTS):
$ python --version
Python 2.6.6

# Latest released version:
$ curator --version
curator 1.2.2

Brian

On Tuesday, August 5, 2014 8:18:24 PM UTC-4, Aaron Mildenstein wrote:

 Hmm.  What version of python are you using?  I am able to use --logfile 
 or -l interchangeably.

 I'm glad you like Curator, and I like KELTIC :)  Nice acronym.



-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/764826ca-3da6-419e-807a-f940cd86a8a6%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: transport client? really?

2014-08-06 Thread Brian
Here is my experience. Yours may vary.

I also use the TransportClient. And then I wrap our business rules behind 
another server that offers an HTTP REST API but talks to Elasticsearch on 
the back end via the TransportClient. This server uses Netty and the LMAX 
Disruptor to provide low-resource high-throughput processing; it is 
somewhat like Node.js but in Java instead of JavaScript.

Then I have a bevy of command-line maintenance and test tools that also use 
the TransportClient. I wrap them inside a shell script (for example, 
Foobar.main is wrapped inside foobar.sh) and convert command-line options 
(such as -t person) into Java properties (such as TypeName=person), and 
also set the classpath to all of the Elasticsearch jars plus all of mine.

Whenever there is a compelling change to Elasticsearch, I upgrade, and many 
times I have watched my Java builds fail with all of the breaking changes. 
But even with the worst of the breaking changes, it was down for maybe a 
day or two at the most; the API is rather clean, and this newsgroup is a 
life saver, and so I never got stuck. And when I was done, I had learned 
even more about the ES Java API.

So it's either a huge pain or it's the joy of learning, depending on your 
point of view. I have always viewed it as the joy of learning.

I just wish the Facets-to-Aggregations migration was smoother. But I sense 
that there will be another breaking change on my horizon. This will be 
particularly sad for me, as I had implemented a rather nice hierarchical 
term frequency combining mvel and facets. Which are now deprecated and on 
the list to be removed. But again, I'll learn a lot when making the 
migration.

I believe it was Thomas Edison who said that most people miss opportunities 
because the opportunities come dressed in overalls and look like work. But 
I digress :-)

Brian

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/40a95f8f-e616-4086-837e-071539078fd4%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Some observations with Curator

2014-08-05 Thread Brian
Using the most recent release (1.2.2) of Curator, I noticed that the 
documentation says --logfile while curator itself rejects --logfile 
anywhere and requires -l in front of the other options to direct its log 
entries. No big deal; I just tested it until it worked before adding it to 
the cron job. And it is working superbly.

We will be standing up several ELK instances in various QA data centers to 
analyze several independent product load tests. These ELK instances are 
also independent, as we do not wish to flood the logstash data across any 
of our inter-data-center VPN / router connections. And because they are 
independent, our operations folks are leery of manually keeping track of 
multiple instances of the ELK stack with which they have no familiarity.

And so, Elasticsearch Curator is becoming an integral part of the 
automation of the ELK stack for us, as it helps to keep our hard-working 
operations folks from overload. We wish for ELK to be an asset and not an 
added drain on time and effort, and Curator is a vital part of that goal. 
To the point where I no longer think of it as simply the ELK stack, but 
rather the KELTIC stack:

*Kibana, Elasticsearch, Logstash, Time-based Indices, Curator*.

But whether ELK or KELTIC, the stack is awesome! Many thanks to all who 
contributed and who continue to drive it forward!

Brian

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/39d8300d-27fc-42da-b10b-3bb8280573d4%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Python version for curator

2014-08-04 Thread Brian
An update: I have installed curator 1.2.2 by downloading the zip archive, 
unpacking it, and then installing it directly:

$ cd curator-1.2.2
$ sudo python setup.py install

Not sure if it's the fix since the previous version of curator, or else the 
pip-less install. But either way, it's working fine just as expected. And 
it works superbly!

Brian

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/2fb50c60-56cb-47f5-b284-f723ba10e93f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Failure to execute ttl purge

2014-08-04 Thread Brian
What version of Elasticsearch? Of Java?

How is TTL being used? For example, one extreme is to constantly add log 
data and then delete old data. This case is, of course, best handled with 
time-based indices and a tool such as curator to delete old data by index 
and not by individual document via TTL.

I have run some test cases with TTL using Elasticsearch 1.3.0 and Java 
7u60. I set a _ttl value of 5m and hammered Elasticsearch. After some time 
passed and several million documents had been added, I shut down the test 
and watched the TTL processing clean up. It took some time but it always 
succeeded. It was rather nice to see that my TTL tests were self-cleaning: 
I always ended up with an empty index after each run.

This discussion may also shed a bit of light: 
http://elasticsearch-users.115913.n3.nabble.com/TTL-Load-Problems-td4024001.html

Brian

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/8544b414-5017-465e-bdd2-665e0e52db7d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Node Client with bulk request indefinitely blocked thread when ClusterBlockException is being thrown

2014-08-04 Thread Brian
Alex,

By the way, is this bug seen with the TransportClient also, or just the 
NodeClient?

Thanks!

Brian

On Monday, August 4, 2014 4:27:35 AM UTC-4, Alexander Reelsen wrote:

 Hey,

 Just a remote guess without knowing more: On your client side, the 
 exception is wrapped, so you need to unwrap it first.


 --Alex


 On Wed, Jul 23, 2014 at 9:47 AM, Cosmin-Radu Vasii cosminra...@gmail.com 
 javascript: wrote:

 I am using the dataless NodeClient to connect to my cluster (version is 
 1.1.1). Everything is working ok, except when failures occur. The scenario 
 is the following: 
 -I have an application java based which connects to ES Cluster 
 (application is started and the cluster is up and running) 
 -I shutdown the cluster 
 -I try to send a bulk request 
 -The following exception is displayed in the logs, which is normal. But 
 my call never catches the exception: 

 Exception in thread elasticsearch[Lasher][generic][T#6] 
 org.elasticsearch.cluster.block.ClusterBlockException: blocked by: 
 [SERVICE_UNAVAILABLE/1/state not recovered / initialized];[SERVICE_UNAVAILA 
 BLE/2/no master]; 
 at 
 org.elasticsearch.cluster.block.ClusterBlocks.globalBlockedException(ClusterBlocks.java:138)
  
 at 
 org.elasticsearch.cluster.block.ClusterBlocks.globalBlockedRaiseException(ClusterBlocks.java:128)
  
 at 
 org.elasticsearch.action.bulk.TransportBulkAction.executeBulk(TransportBulkAction.java:197)
  
 at 
 org.elasticsearch.action.bulk.TransportBulkAction.access$000(TransportBulkAction.java:65)
  
 at 
 org.elasticsearch.action.bulk.TransportBulkAction$1.onFailure(TransportBulkAction.java:143)
  
 at 
 org.elasticsearch.action.support.TransportAction$ThreadedActionListener$2.run(TransportAction.java:117)
  
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
  
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
  
 at java.lang.Thread.run(Thread.java:724) 

 My code is something like this 

 BulkResponse response; 
 try { 
 response = requestBuilder.execute().actionGet(); 
 } 
 catch(NoNodeAvailableException ex){ 
 LOGGER.error(Cannot connect to ES Cluster:   + 
 ex.getMessage()); 
 throw ex; 
 } 
 catch (ClusterBlockException ex){ 
 LOGGER.error(Cannot connect to ES Cluster:   + 
 ex.getMessage()); 
 throw ex; 
 } 
 catch (Exception ex) { 

 LOGGER.error(Exception in processing indexing request by ES 
 server.  + ex.getMessage()); 
 } 

 When I use a single request everything is ok. I also noticed a TODO in 
 the ES code in the TransportBulkAction.java 

 private void executeBulk(final BulkRequest bulkRequest, final long 
 startTime, final ActionListenerBulkResponse listener, final 
 AtomicArrayBulkItemResponse responses ) { 
 ClusterState clusterState = clusterService.state(); 
 // TODO use timeout to wait here if its blocked... 
 
 clusterState.blocks().globalBlockedRaiseException(ClusterBlockLevel.WRITE); 

 } 

 Is this a known situation or a known bug or I am missing something? 
  
 -- 
 You received this message because you are subscribed to the Google Groups 
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an 
 email to elasticsearc...@googlegroups.com javascript:.
 To view this discussion on the web visit 
 https://groups.google.com/d/msgid/elasticsearch/109057dc-70c4-471a-bd6d-8b8e72c37ff6%40googlegroups.com
  
 https://groups.google.com/d/msgid/elasticsearch/109057dc-70c4-471a-bd6d-8b8e72c37ff6%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.




-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/02334b56-2853-4105-bfab-3566d20a721c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Creating elasticsearch index mandatory?

2014-08-04 Thread Brian
By default, Elasticsearch automatically creates an index if a document is 
being added and the index doesn't already exist.

Logstash automatically specifies a time-based index with day precision for 
each log entry. In other words:

logstash-2014.07.28
logstash-2014.07.29
logstash-2014.07.30
logstash-2014.07.31
logstash-2014.08.01
logstash-2014.08.02
logstash-2014.08.03
logstash-2014.08.04

And Kibana's time picker automatically assumes the logstash defaults, so 
you should be good to go.

One thing that initially tripped me up, and might trip you up: When I first 
ran Kibana I didn't see any of my data. But that's because I had loaded 
some test data into it, and the default time picker only went back a few 
minutes into the past.

Brian

On Monday, August 4, 2014 4:03:05 PM UTC-4, Acche Din wrote:

 Hello All,

 I have a ELK setup 'out of the box' . My goal is to parse apache logs via 
 logstash and display it in kibana.

 I would like to know if it is mandatory to create an index on 
 elasticsearch so as to store the result from apache logs(I have 
 logstash.conf output=elasticsearch)


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/3abf0a58-7713-4e06-a272-e5d579ea4281%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: SIREn plugin for nested documents

2014-07-23 Thread Brian
Thanks for the link. Unfortunately, Chrome on Mac OS (latest versions of 
each) causes this web page to blank and redisplay continually. Can't read 
it; hope you can.

In a previous life, I created a search engine that handled parent/child 
relationships with blindingly fast performance. One trick was that the 
index didn't just contain the document ID, but it contained the entire 
hierarchy of IDs. So, for example (and brevity, the IDs are single letters):

Document ID and
relationship  Fully qualified and indexed ID
---   --
A A
   B  A.B
  C   A.B.C
   D  A.D
  E   A.D.E
  F   A.D.F

So for example, it was nearly instantaneous to determine that, just by 
looking at and comparing the fully qualified IDs:

A and F are in the same parent-child hierarchy, with F being a child of D 
and a grandchild of A.

E and F are siblings under the same parent.

And so on.

Not sure how this would mesh with Lucene though. But complex parent-child 
relationships could be intersected just by the fully qualified IDs that 
came out of the inverted index. Documents did not need to be fetched or 
cached to perform this operation, and the result was breathtakingly 
blindingly fast performance.

Just FYI. I can discuss off-line if anyone wishes.

Brian

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/5b6ef1ce-3daf-4de5-b106-710fd306863d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Can one do a singular/plural phrase match with the Query String Query?

2014-07-21 Thread Brian Jones
Can one perform the following query using wildcards ( instead of two 
distinct phrases ) when using a Query String Query?
photographic film OR photographic films

These do not seem to work, and return the same number of results as just 
photographic 
film:
photographic film?
photographic film*

Can wildcards not be placed inside Exact Phrase queries?  Is there a way to 
mimic this?

My goal is to be able to perform queries like this:
photo* film?

... capturing:
photo film
photo films
photographic films
photography films
etc...

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/66cc151f-a235-40d4-a125-2236aae0f9bf%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Specifying a Phrase within a Proximity Phrase Search?

2014-07-18 Thread Brian Jones
I'm using the a Query String Query to perform a Proximity Search.

I'm wondering if ( and if yes how ) I can nest a phrase within the overall 
phrase:
wood glue manufacturer~5 ( where wood glue would be kept as a phrase )

My users have access to a Query String Query box and I'm exploring more 
advanced search capability through this box ... so performing the 
equivalent with other Query types is not helpful here ... for instance, I 
know that I can use a Span Near Query to accomplish this.

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/e77f989c-81d9-4ccd-9723-e82f371faf20%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [ANN] Log4j2 Elasticsearch appender

2014-07-18 Thread Brian
Awesome! I had been wondering to myself about this for a while.

Brian

On Friday, July 18, 2014 4:08:14 AM UTC-4, Jörg Prante wrote:

 Hi,

 I released a Log4j2 Elasticsearch appender

 https://github.com/jprante/log4j2-elasticsearch

 in the hope it is useful.

 Best,

 Jörg


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/cff91677-fe6d-48af-85df-dcd09880c44d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Dropped HTTP Connections when Indexing

2014-07-18 Thread Brian Jones
I'm trying to scale my indexing for the first time, and I'm running into 
connections problems.  I reach a scale where cURL connections from my 
indexers start getting cURL7 errors ( connect failed ).  It looks like ES 
just stops accepting all HTTP connections for a period of time.  I cannot 
find the root cause.

I'm running on an Amazon C3.4XL.  The processors are not maxed, memory is 
not maxed, IO is not showing issues.  I'm not seeing problems in the ES 
log, but I'm not sure I have logging fully enabled. I've tried increasing 
the thread_pool for the indexer, and that doesn't help.  I'm not seeing any 
rejected connections there.  I'm at a loss.

The closest I can get is a guess using data from Bigdesk.  When the number 
HTTP channels starts exceeding the number of transport channels, I start to 
see the problem emerge.  I have no idea if this is related, but it's the 
only metric I've traced that seems correlated.

Thoughts?

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/0fd75b96-3c00-47f1-9a59-3c9707f38734%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Updating Datatype in Elasticsearch

2014-07-16 Thread Brian
Within my configuration directory's templates/automap.json file is the 
following template. Elasticsearch uses this template whenever it generates 
a new logstash index each day:

{
  automap : {
template : logstash-*,
settings : {
  index.mapping.ignore_malformed : true
},
mappings : {
  _default_ : {
numeric_detection : true,
_all : { enabled : false },
properties : {
  message : { type : string },
  host : { type : string },
  UUID : {  type : string, index : not_analyzed },
  logdate : {  type : string, index : no }
}
  }
}
  }
}

Note:

1. How to ignore malformed data (for example, a numeric field that contains 
no-data every once in a while).

2. How to automatically detect numeric fields. Logstash makes every JSON 
value a string. Elasticsearch automatically detects dates, but must be 
explicitly configured to automatically detect numeric fields.

3. Listing fields that must be considered to be strings even if they 
contain numeric values, or must not be analyzed, or must not be indexed at 
all.

4. Disabling of the _all field: As long as your logstash configuration 
leaves the message field pretty much intact, disabling the _all field will 
reduce disk space, increase performance, while still keeping all search 
functionality. But then, don't forget to also update your Elasticsearch 
configuration to specify message as the default field.

Hope this helps!

Brian

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/e6e95468-3e21-4dc7-82eb-129a58c85852%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Python version for curator

2014-07-16 Thread Brian
No joy:

$ *pip install elasticsearch*
Requirement already satisfied (use --upgrade to upgrade): elasticsearch in 
/usr/lib/python2.6/site-packages
Cleaning up...

$ *curator --help*
Traceback (most recent call last):
  File /usr/bin/curator, line 5, in module
from pkg_resources import load_entry_point
  File /usr/lib/python2.6/site-packages/pkg_resources.py, line 2655, in 
module
working_set.require(__requires__)
  File /usr/lib/python2.6/site-packages/pkg_resources.py, line 648, in 
require
needed = self.resolve(parse_requirements(requirements))
  File /usr/lib/python2.6/site-packages/pkg_resources.py, line 546, in 
resolve
raise DistributionNotFound(req)
pkg_resources.DistributionNotFound: elasticsearch=1.0.0,2.0.0

$ *uname -a*
Linux elktest 2.6.32-431.17.1.el6.x86_64 #1 SMP Wed May 7 23:32:49 UTC 2014 
x86_64 x86_64 x86_64 GNU/Linux

Brian

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/04eeb676-9b57-46ac-9b7b-fc1b45824d79%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Python version for curator

2014-07-14 Thread Brian
A quick question: Is Python 2 acceptable for use with curator, or is Python 
3 required?

Thanks!

Brian

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/c21d8db3-1b39-417e-bf36-1e1bbcaae03b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Python version for curator

2014-07-14 Thread Brian
To continue, I installed curator on a Python 2.6.6 system thusly:

pip install elasticsearch-curator

And Elasticsearch 1.2.1 is installed on the same server. But when running 
curator --help, I see:

*$ curator --help*
Traceback (most recent call last):
  File /usr/bin/curator, line 5, in module
from pkg_resources import load_entry_point
  File /usr/lib/python2.6/site-packages/pkg_resources.py, line 2655, in 
module
working_set.require(__requires__)
  File /usr/lib/python2.6/site-packages/pkg_resources.py, line 648, in 
require
needed = self.resolve(parse_requirements(requirements))
  File /usr/lib/python2.6/site-packages/pkg_resources.py, line 546, in 
resolve
raise DistributionNotFound(req)
pkg_resources.DistributionNotFound: elasticsearch=1.0.0,2.0.0

This was per the information found at: 
https://github.com/elasticsearch/curator

I'm not a Python dev (yet, anyway) but I don't believe I left anything out 
that was explicitly mentioned on the curator github page.

Brian

On Monday, July 14, 2014 3:00:27 PM UTC-4, Brian wrote:

 A quick question: Is Python 2 acceptable for use with curator, or is 
 Python 3 required?

 Thanks!

 Brian


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/4c53b1a5-71bb-49b2-b843-518fc825%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Sorting on Parent/Child attributes

2014-07-11 Thread Brian Rook
Hello,

I'm looking for a solution to a problem I am having.  Lets say I have 2 
types Person and Pet in an index called customers.
Person
-account
-firstname
-lastname
-SSN

Pet
-name
-type
-id
-account

I would like to query/filter on fields in both person and pet in order to 
retrieve people and their associated pet.  Additionally, I need to sort on 
a field that could be in either person or pet.

For example, retrieve all people/pets that have wildcard person.firstname 
'*ave*' and pet.type wildcard '*terrier*' and sort on pet.name.  Or 
wildcard search on person.SSN = '*55*' and pet.name='*mister*' and sort on 
person.lastname.

I currently have a solution where I search/sort on people or pet based on 
the sort that I am using.  I use a hasChild/hasParent to manage the fields 
that are on the 'other' type.  Then I use an id field to retrieve the 
entities of the other type.  So, if I have a sort on personfirstname, I 
query on person and child (pet) and sort on person.firstname, then use the 
accounts to retrieve the pets (by account) in another query.  This is not 
ideal because it is ugly and I suspect difficult to maintain if this 
query's requirements change in the future.

I suspect that I can do a query at the 'customers' level and do 'type' 
queries on the fields that I need for person and pet.  Similar to this:
http://joelabrahamsson.com/grouping-in-elasticsearch-using-child-documents/

However, I'm not sure how I would implement the sort.  I suspect that I 
could use a custom scoring script, but I am not sure how I would score text 
fields.


Any thoughts?

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/53e9d304-d1e3-4994-b95f-3e5f0052ec96%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Setting id of document with elasticsearch-hadoop that is not in source document

2014-07-11 Thread Brian Thomas
I was just curious if there was a way of doing this without doing this, I 
can add the field if necessary.

For alternatives, what if in addition to es.mapping.id, there is another 
property available also, like es.mapping.id.include.in.src where you could 
specify whether the src field actually gets included in the source 
document.  In elasticsearch, you can create and update documents without 
having to include the id in the source document, so I think it would make 
sense to be able to do that with elasticsearch-hadoop also.

On Thursday, July 10, 2014 5:49:18 PM UTC-4, Costin Leau wrote:

 You need to specify the id of the document you want to update somehow. 
 Since in es-hadoop things are batch focused, each 
 doc needs its own id specified somehow hence the use of 'es.mapping.id' 
 to indicate its value. 
 Is there a reason why this approach does not work for you - any 
 alternatives that you thought of? 

 Cheers, 

 On 7/7/14 10:48 PM, Brian Thomas wrote: 
  I am trying to update an elasticsearch index using elasticsearch-hadoop. 
  I am aware of the *es.mapping.id* 
  configuration where you can specify that field in the document to use as 
 an id, but in my case the source document does 
  not have the id (I used elasticsearch's autogenerated id when indexing 
 the document).  Is it possible to specify the id 
  to update without having the add a new field to the MapWritable object? 
  
  
  -- 
  You received this message because you are subscribed to the Google 
 Groups elasticsearch group. 
  To unsubscribe from this group and stop receiving emails from it, send 
 an email to 
  elasticsearc...@googlegroups.com javascript: mailto:
 elasticsearch+unsubscr...@googlegroups.com javascript:. 
  To view this discussion on the web visit 
  
 https://groups.google.com/d/msgid/elasticsearch/ce6161ad-d442-4ffb-9162-114cb8cd76dd%40googlegroups.com
  
  
 https://groups.google.com/d/msgid/elasticsearch/ce6161ad-d442-4ffb-9162-114cb8cd76dd%40googlegroups.com?utm_medium=emailutm_source=footer.
  

  For more options, visit https://groups.google.com/d/optout. 

 -- 
 Costin 


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/77259ed3-a896-47cc-9304-cc32046756ad%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Setting id of document with elasticsearch-hadoop that is not in source document

2014-07-11 Thread Brian Thomas
I was just curious if there was a way of doing this without doing this, I 
can add the field if necessary.

For alternatives, what if in addition to es.mapping.id, there is another 
property available also, like es.mapping.id.exlude that will not include 
the id field in the source document.  In elasticsearch, you can create and 
update documents without having to include the id in the source document, 
so I think it would make sense to be able to do that with 
elasticsearch-hadoop also.

On Thursday, July 10, 2014 5:49:18 PM UTC-4, Costin Leau wrote:

 You need to specify the id of the document you want to update somehow. 
 Since in es-hadoop things are batch focused, each 
 doc needs its own id specified somehow hence the use of 'es.mapping.id' 
 to indicate its value. 
 Is there a reason why this approach does not work for you - any 
 alternatives that you thought of? 

 Cheers, 

 On 7/7/14 10:48 PM, Brian Thomas wrote: 
  I am trying to update an elasticsearch index using elasticsearch-hadoop. 
  I am aware of the *es.mapping.id* 
  configuration where you can specify that field in the document to use as 
 an id, but in my case the source document does 
  not have the id (I used elasticsearch's autogenerated id when indexing 
 the document).  Is it possible to specify the id 
  to update without having the add a new field to the MapWritable object? 
  
  
  -- 
  You received this message because you are subscribed to the Google 
 Groups elasticsearch group. 
  To unsubscribe from this group and stop receiving emails from it, send 
 an email to 
  elasticsearc...@googlegroups.com javascript: mailto:
 elasticsearch+unsubscr...@googlegroups.com javascript:. 
  To view this discussion on the web visit 
  
 https://groups.google.com/d/msgid/elasticsearch/ce6161ad-d442-4ffb-9162-114cb8cd76dd%40googlegroups.com
  
  
 https://groups.google.com/d/msgid/elasticsearch/ce6161ad-d442-4ffb-9162-114cb8cd76dd%40googlegroups.com?utm_medium=emailutm_source=footer.
  

  For more options, visit https://groups.google.com/d/optout. 

 -- 
 Costin 


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/2c6753aa-c459-489b-9f86-6803a5616718%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: java.lang.NoSuchFieldError: ALLOW_UNQUOTED_FIELD_NAMES when trying to query elasticsearch using spark

2014-07-07 Thread Brian Thomas
Here is the gradle build I was using originally:

apply plugin: 'java'
apply plugin: 'eclipse'

sourceCompatibility = 1.7
version = '0.0.1'
group = 'com.spark.testing'

repositories {
mavenCentral()
}

dependencies {
compile 'org.apache.spark:spark-core_2.10:1.0.0'
compile 'edu.stanford.nlp:stanford-corenlp:3.3.1'
compile group: 'edu.stanford.nlp', name: 'stanford-corenlp', version: 
'3.3.1', classifier:'models'
compile files('lib/elasticsearch-hadoop-2.0.0.jar')
testCompile 'junit:junit:4.+'
testCompile group: com.github.tlrx, name: elasticsearch-test, version: 
1.2.1
}


When I ran dependencyInsight on jackson, I got the following output:

C:\dev\workspace\SparkProjectgradle dependencyInsight --dependency 
jackson-core

:dependencyInsight
com.fasterxml.jackson.core:jackson-core:2.3.0
\--- com.fasterxml.jackson.core:jackson-databind:2.3.0
 +--- org.json4s:json4s-jackson_2.10:3.2.6
 |\--- org.apache.spark:spark-core_2.10:1.0.0
 | \--- compile
 \--- com.codahale.metrics:metrics-json:3.0.0
  \--- org.apache.spark:spark-core_2.10:1.0.0 (*)

org.codehaus.jackson:jackson-core-asl:1.0.1
\--- org.codehaus.jackson:jackson-mapper-asl:1.0.1
 \--- org.apache.hadoop:hadoop-core:1.0.4
  \--- org.apache.hadoop:hadoop-client:1.0.4
   \--- org.apache.spark:spark-core_2.10:1.0.0
\--- compile

Version 1.0.1 of jackson-core-asl does not have the field 
ALLOW_UNQUOTED_FIELD_NAMES, but later versions of it do.

On Sunday, July 6, 2014 4:28:56 PM UTC-4, Costin Leau wrote:

 Hi,

 Glad to see you sorted out the problem. Out of curiosity what version of 
 jackson were you using and what was pulling it in? Can you share you maven 
 pom/gradle build?


 On Sun, Jul 6, 2014 at 10:27 PM, Brian Thomas brianjt...@gmail.com 
 javascript: wrote:

 I figured it out, dependency issue in my classpath.  Maven was pulling 
 down a very old version of the jackson jar.  I added the following line to 
 my dependencies and the error went away:

 compile 'org.codehaus.jackson:jackson-mapper-asl:1.9.13'


 On Friday, July 4, 2014 3:22:30 PM UTC-4, Brian Thomas wrote:

  I am trying to test querying elasticsearch using Apache Spark using 
 elasticsearch-hadoop.  I am just trying to do a query to the elasticsearch 
 server and return the count of results.

 Below is my test class using the Java API:

 import org.apache.hadoop.conf.Configuration;
 import org.apache.hadoop.io.MapWritable;
 import org.apache.hadoop.io.Text;
 import org.apache.spark.SparkConf;
 import org.apache.spark.api.java.JavaPairRDD;
 import org.apache.spark.api.java.JavaSparkContext;
 import org.apache.spark.serializer.KryoSerializer;
 import org.elasticsearch.hadoop.mr.EsInputFormat;

 import scala.Tuple2;

 public class ElasticsearchSparkQuery{

 public static int query(String masterUrl, String 
 elasticsearchHostPort) {
 SparkConf sparkConfig = new SparkConf().setAppName(
 ESQuery).setMaster(masterUrl);
 sparkConfig.set(spark.serializer, 
 KryoSerializer.class.getName());
 JavaSparkContext sparkContext = new 
 JavaSparkContext(sparkConfig);

 Configuration conf = new Configuration();
 conf.setBoolean(mapred.map.tasks.speculative.execution, 
 false);
 conf.setBoolean(mapred.reduce.tasks.speculative.execution, 
 false);
 conf.set(es.nodes, elasticsearchHostPort);
 conf.set(es.resource, media/docs);
 conf.set(es.query, ?q=*);

 JavaPairRDDText, MapWritable esRDD = 
 sparkContext.newAPIHadoopRDD(conf, EsInputFormat.class, Text.class,
 MapWritable.class);
 return (int) esRDD.count();
 }
 }


 When I try to run this I get the following error:


 4/07/04 14:58:07 INFO executor.Executor: Running task ID 0
 14/07/04 14:58:07 INFO storage.BlockManager: Found block broadcast_0 
 locally
 14/07/04 14:58:07 INFO rdd.NewHadoopRDD: Input split: ShardInputSplit 
 [node=[5UATWUzmTUuNzhmGxXWy_w/S'byll|10.45.71.152:9200],shard=0]
 14/07/04 14:58:07 WARN mr.EsInputFormat: Cannot determine task id...
 14/07/04 14:58:07 ERROR executor.Executor: Exception in task ID 0
 java.lang.NoSuchFieldError: ALLOW_UNQUOTED_FIELD_NAMES
 at org.elasticsearch.hadoop.serialization.json.
 JacksonJsonParser.clinit(JacksonJsonParser.java:38)
 at org.elasticsearch.hadoop.serialization.ScrollReader.
 read(ScrollReader.java:75)
 at org.elasticsearch.hadoop.rest.RestRepository.scroll(
 RestRepository.java:267)
 at org.elasticsearch.hadoop.rest.ScrollQuery.hasNext(
 ScrollQuery.java:75)
 at org.elasticsearch.hadoop.mr.EsInputFormat$ShardRecordReader.next(
 EsInputFormat.java:319)
 at org.elasticsearch.hadoop.mr.EsInputFormat$ShardRecordReader.
 nextKeyValue(EsInputFormat.java:255)
 at org.apache.spark.rdd.NewHadoopRDD$$anon$1.hasNext(
 NewHadoopRDD.scala:122)
 at org.apache.spark.InterruptibleIterator.hasNext(
 InterruptibleIterator.scala:39)
 at org.apache.spark.util.Utils$.getIteratorSize

Setting id of document with elasticsearch-hadoop that is not in source document

2014-07-07 Thread Brian Thomas
I am trying to update an elasticsearch index using elasticsearch-hadoop.  I 
am aware of the *es.mapping.id* configuration where you can specify that 
field in the document to use as an id, but in my case the source document 
does not have the id (I used elasticsearch's autogenerated id when indexing 
the document).  Is it possible to specify the id to update without having 
the add a new field to the MapWritable object?


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/ce6161ad-d442-4ffb-9162-114cb8cd76dd%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: java.lang.NoSuchFieldError: ALLOW_UNQUOTED_FIELD_NAMES when trying to query elasticsearch using spark

2014-07-06 Thread Brian Thomas
I figured it out, dependency issue in my classpath.  Maven was pulling down 
a very old version of the jackson jar.  I added the following line to my 
dependencies and the error went away:

compile 'org.codehaus.jackson:jackson-mapper-asl:1.9.13'

On Friday, July 4, 2014 3:22:30 PM UTC-4, Brian Thomas wrote:

  I am trying to test querying elasticsearch using Apache Spark using 
 elasticsearch-hadoop.  I am just trying to do a query to the elasticsearch 
 server and return the count of results.

 Below is my test class using the Java API:

 import org.apache.hadoop.conf.Configuration;
 import org.apache.hadoop.io.MapWritable;
 import org.apache.hadoop.io.Text;
 import org.apache.spark.SparkConf;
 import org.apache.spark.api.java.JavaPairRDD;
 import org.apache.spark.api.java.JavaSparkContext;
 import org.apache.spark.serializer.KryoSerializer;
 import org.elasticsearch.hadoop.mr.EsInputFormat;

 import scala.Tuple2;

 public class ElasticsearchSparkQuery{

 public static int query(String masterUrl, String 
 elasticsearchHostPort) {
 SparkConf sparkConfig = new 
 SparkConf().setAppName(ESQuery).setMaster(masterUrl);
 sparkConfig.set(spark.serializer, 
 KryoSerializer.class.getName());
 JavaSparkContext sparkContext = new JavaSparkContext(sparkConfig);

 Configuration conf = new Configuration();
 conf.setBoolean(mapred.map.tasks.speculative.execution, false);
 conf.setBoolean(mapred.reduce.tasks.speculative.execution, 
 false);
 conf.set(es.nodes, elasticsearchHostPort);
 conf.set(es.resource, media/docs);
 conf.set(es.query, ?q=*);

 JavaPairRDDText, MapWritable esRDD = 
 sparkContext.newAPIHadoopRDD(conf, EsInputFormat.class, Text.class,
 MapWritable.class);
 return (int) esRDD.count();
 }
 }


 When I try to run this I get the following error:


 4/07/04 14:58:07 INFO executor.Executor: Running task ID 0
 14/07/04 14:58:07 INFO storage.BlockManager: Found block broadcast_0 
 locally
 14/07/04 14:58:07 INFO rdd.NewHadoopRDD: Input split: ShardInputSplit 
 [node=[5UATWUzmTUuNzhmGxXWy_w/S'byll|10.45.71.152:9200],shard=0]
 14/07/04 14:58:07 WARN mr.EsInputFormat: Cannot determine task id...
 14/07/04 14:58:07 ERROR executor.Executor: Exception in task ID 0
 java.lang.NoSuchFieldError: ALLOW_UNQUOTED_FIELD_NAMES
 at 
 org.elasticsearch.hadoop.serialization.json.JacksonJsonParser.clinit(JacksonJsonParser.java:38)
 at 
 org.elasticsearch.hadoop.serialization.ScrollReader.read(ScrollReader.java:75)
 at 
 org.elasticsearch.hadoop.rest.RestRepository.scroll(RestRepository.java:267)
 at 
 org.elasticsearch.hadoop.rest.ScrollQuery.hasNext(ScrollQuery.java:75)
 at 
 org.elasticsearch.hadoop.mr.EsInputFormat$ShardRecordReader.next(EsInputFormat.java:319)
 at 
 org.elasticsearch.hadoop.mr.EsInputFormat$ShardRecordReader.nextKeyValue(EsInputFormat.java:255)
 at 
 org.apache.spark.rdd.NewHadoopRDD$$anon$1.hasNext(NewHadoopRDD.scala:122)
 at 
 org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
 at org.apache.spark.util.Utils$.getIteratorSize(Utils.scala:1014)
 at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:847)
 at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:847)
 at 
 org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1080)
 at 
 org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1080)
 at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:111)
 at org.apache.spark.scheduler.Task.run(Task.scala:51)
 at 
 org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:187)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:745)

 Has anyone run into this issue with the JacksonJsonParser?



-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/9c2b2f2e-5196-4a72-bfbc-4cd0fda9edf0%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


java.lang.NoSuchFieldError: ALLOW_UNQUOTED_FIELD_NAMES when trying to query elasticsearch using spark

2014-07-04 Thread Brian Thomas
 I am trying to test querying elasticsearch using Apache Spark using 
elasticsearch-hadoop.  I am just trying to do a query to the elasticsearch 
server and return the count of results.

Below is my test class using the Java API:

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.io.MapWritable;
import org.apache.hadoop.io.Text;
import org.apache.spark.SparkConf;
import org.apache.spark.api.java.JavaPairRDD;
import org.apache.spark.api.java.JavaSparkContext;
import org.apache.spark.serializer.KryoSerializer;
import org.elasticsearch.hadoop.mr.EsInputFormat;

import scala.Tuple2;

public class ElasticsearchSparkQuery{

public static int query(String masterUrl, String elasticsearchHostPort) 
{
SparkConf sparkConfig = new 
SparkConf().setAppName(ESQuery).setMaster(masterUrl);
sparkConfig.set(spark.serializer, KryoSerializer.class.getName());
JavaSparkContext sparkContext = new JavaSparkContext(sparkConfig);

Configuration conf = new Configuration();
conf.setBoolean(mapred.map.tasks.speculative.execution, false);
conf.setBoolean(mapred.reduce.tasks.speculative.execution, false);
conf.set(es.nodes, elasticsearchHostPort);
conf.set(es.resource, media/docs);
conf.set(es.query, ?q=*);

JavaPairRDDText, MapWritable esRDD = 
sparkContext.newAPIHadoopRDD(conf, EsInputFormat.class, Text.class,
MapWritable.class);
return (int) esRDD.count();
}
}


When I try to run this I get the following error:


4/07/04 14:58:07 INFO executor.Executor: Running task ID 0
14/07/04 14:58:07 INFO storage.BlockManager: Found block broadcast_0 locally
14/07/04 14:58:07 INFO rdd.NewHadoopRDD: Input split: ShardInputSplit 
[node=[5UATWUzmTUuNzhmGxXWy_w/S'byll|10.45.71.152:9200],shard=0]
14/07/04 14:58:07 WARN mr.EsInputFormat: Cannot determine task id...
14/07/04 14:58:07 ERROR executor.Executor: Exception in task ID 0
java.lang.NoSuchFieldError: ALLOW_UNQUOTED_FIELD_NAMES
at 
org.elasticsearch.hadoop.serialization.json.JacksonJsonParser.clinit(JacksonJsonParser.java:38)
at 
org.elasticsearch.hadoop.serialization.ScrollReader.read(ScrollReader.java:75)
at 
org.elasticsearch.hadoop.rest.RestRepository.scroll(RestRepository.java:267)
at 
org.elasticsearch.hadoop.rest.ScrollQuery.hasNext(ScrollQuery.java:75)
at 
org.elasticsearch.hadoop.mr.EsInputFormat$ShardRecordReader.next(EsInputFormat.java:319)
at 
org.elasticsearch.hadoop.mr.EsInputFormat$ShardRecordReader.nextKeyValue(EsInputFormat.java:255)
at 
org.apache.spark.rdd.NewHadoopRDD$$anon$1.hasNext(NewHadoopRDD.scala:122)
at 
org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
at org.apache.spark.util.Utils$.getIteratorSize(Utils.scala:1014)
at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:847)
at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:847)
at 
org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1080)
at 
org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1080)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:111)
at org.apache.spark.scheduler.Task.run(Task.scala:51)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:187)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)

Has anyone run into this issue with the JacksonJsonParser?

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/9da5ae25-3e57-4c24-ab45-c62c987ebec0%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: get rid of _all to optimize storage and perfs (Re: Splunk vs. Elastic search performance?)

2014-07-02 Thread Brian


 Patrick, 




 * Well, I did answer your question. But probably not from the direction 
 you expected. hmm no, you didn't. My question was: it looks like I cant 
 retrieve/display [_all fields] content. Any idea? and you replied with 
 your logstash template where _all is disabled. I'm interested in disabling 
 _all, but that was not my question at this point.*


Fair enough. I don't know the inner details; I am just an enthusiastic end 
user.

To the best of my knowledge, there is no content for the _all field; I view 
this as an Elasticsearch psuedo field whose name is _all and whose index 
terms are taken from all fields (by default), but still there is no actual 
content for it.

And after I got into the habit of disabling the _all field, my hands-on 
exploration of its nuances have ended. It's time for the experts to explain!
 

   
 *Your answer to my second message, below, is informative and interesting 
 but fails to answer my second question too. I simply asked whether I need 
 to feed the complete modified mapping of my template or if I can just push 
 the modified part (ie. the _all:{enabled: false} part). *


 Again, I have never done this, so I can only tell you what I do. I just 
cannot tell you all the nuances of what Elasticsearch is capable of.

My recommendation is to try it. Elasticsearch is great at letting you 
experiment and then telling you clearly if your attempt succeeds or fails.

So, try your scenario. If it fails, then it didn't work or you did 
something wrong. If it succeeds, then you can see exactly what 
Elasticsearch actually accepted as your mapping. For example:

curl 'http://localhost:9200/logstash-2014.06.30/_mapping?pretty=true'  
echo

This particular query looks at one of my logstash-generated indices, and it 
lets me verify that Elasticsearch and Logstash conspired to create the 
mappings I expected. I used this command quite a bit until I finally got 
everything configured correctly. (I actually verify the mapping via 
Elasticsearch Head, but under the covers it's the same command.)

Brian

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/8eaefd0e-f684-4f44-9fcb-3137812a99d3%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Kibana browser compatibility issues

2014-07-02 Thread Brian
Laura,

The simplest way is to install Kibana as a site plug-in on the same node on 
which you run Elasticsearch. Not the best way from a performance and 
security perspective, but certainly the easiest way to start with an 
absolute minimum of extra levers to pull and knobs to turn, so to speak.

So what does that really mean, a site plugin?

Assume you configure Elasticsearch to look for plugins within the 
/opt/elk/plugins directory.

Then you unpack the Kibana3 distribution within /opt/kibana3. That means 
you'll see the following files within /opt/kibana3/kibana-3.1.0:
app  build.txt  config.js  css  favicon.ico  font  img  index.html 
 LICENSE.md  README.md  vendor

So then create the /opt/elk/plugins/kibana3 directory. Then:
$ ln -s  /opt/kibana3/kibana-3.1.0 /opt/elk/plugins/kibana3/_site

Now when you start ES and point it to the correct configuration file which 
in turn points it to the plugins directory as described above, Kibana will 
be available at the following URL (assuming you're on the same host; change 
localhost as needed, of course):

http://localhost:9200/_plugin/kibana3/

Hope this helps!

Brian

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/59b1ac76-d3a5-4b63-bdc6-f617ef8c0627%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Copy index from production to development instance

2014-06-26 Thread Brian Lamb
Thank you for your suggestion. I tried the stream2es library but I get a 
OutOfMemoryError when trying to use that.

On Friday, June 6, 2014 5:13:19 PM UTC-4, Antonio Augusto Santos wrote:

 Take a look at stream2es https://github.com/elasticsearch/stream2es

 On Friday, June 6, 2014 2:13:06 PM UTC-3, Brian Lamb wrote:

 I should also point out that I had to edit a file in the 
 metadata-snapshot file to change around the s3 keys and bucket name to 
 match what development was expecting.

 On Friday, June 6, 2014 1:11:57 PM UTC-4, Brian Lamb wrote:

 Hi all,

 I want to do a one time copy of the data on my production elastic search 
 instance to my development elastic search instance. Both are managed by AWS 
 if that makes this easier. Here is what I tried:

 On production:

 curl -XPUT 'http://localhost:9200/_snapshot/my_s3_repository' -d '{
 type: s3,
 settings: {
 access_key: productionAccessKey,
 bucket: productionBucketName,
 region: region,
 secret_key: productionSecretKey
 }
 }'
 curl -XPUT 
 http://localhost:9200/_snapshot/my_s3_repository/snapshot_2014_06_02;

 What this does is upload the instance to a production level s3 bucket.

 Then in the aws console, I copy all of it to a development level s3 
 bucket.

 Next on development:

 curl -XPUT 'http://localhost:9200/_snapshot/my_s3_repository' -d '{
 type: s3,
 settings: {
 access_key: developmentAccessKey,
 bucket: developmentBucketName,
 region: region,
 secret_key: developmentSecretKey
 }
 }'
 curl -XPOST 
 http://localhost:9200/_snapshot/my_s3_repository/snapshot_2014_06_02/_restore
 

 This gives me the following message:

 $ curl -XPOST 
 http://localhost:9200/_snapshot/my_s3_repository/snapshot_2014_06_02/_restore?pretty=true
 
 {
   error : SnapshotException[[my_s3_repository:snapshot_2014_06_02] 
 failed to get snapshots]; nested: IOException[Failed to get 
 [snapshot-snapshot_2014_06_02]]; nested: AmazonS3Exception[Status Code: 
 404, AWS Service: Amazon S3, AWS Request ID: RequestId, AWS Error Code: 
 NoSuchKey, AWS Error Message: The specified key does not exist.]; ,
   status : 500
 }

 Also, when I try to get the snapshots, I get the following:

 $ curl -XGET localhost:9200/_snapshot/_status?pretty=true
 {
   snapshots : [ ]
 }

 This leads me to believe that I am not connecting the snapshot correctly 
 but I'm not sure what I am doing incorrectly. Regenerating the index on 
 development is not really a possibility as it took a few months to generate 
 the index the first time around. If there is a better way to do this, I'm 
 all for it. 

 Thanks,

 Brian Lamb



-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/0ed279b6-599f-4c90-917a-d377622e12cd%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Returning Many Large Documents and Performace

2014-06-25 Thread Brian Behling
I'm executing a query where I could possibly return 100k results. The 
documents are quite large, about 3.6 kb per document, 312 mb for 100k of 
these.

When executing the query in ES, the query itself is somewhat fast, about 5 
seconds. But it takes longer than a minute to get the results back from the 
server.

What can be done to improve this performance? Is ElasticSearch not meant to 
handle such large documents?

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/290879ce-64cf-497e-b8c0-b962646daeae%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Pagination: Determine Page Number Of A Record

2014-06-25 Thread Brian Behling
I have a requirement where a document could be anywhere in a result set and 
I need to calculate a page number according where this document is in the 
results. I've been trying many different ideas such as using a script to 
calculate the page number based on the total count and a counter variable, 
but the counter keeps getting reset every time a shard is queried.

I also tried returning the entire result set and calculating this value in 
.net, but ES takes too long to complete a query request for sizes of 8000 
or more. I realize we shouldn't be returning these many features, but the 
scan and scroll is not an option because I will need to parse each response 
to see if the document I'm looking for is in it. 

From and size also wont work because I have no idea what the 'from' value 
will be, and that is the value I'm trying to calculate.

I guess my question is, does any one have an idea of how to calculate a 
page number for a given document inside a query result?

Perhaps there is some functionality in ES that will tell you a document 
with a certain ID is the n'th document in the entire result set?

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/3c2458c1-629c-42d8-8a7e-551c6c093cda%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Proper parsing of String values like 1m, 1q HOUR etc.

2014-06-24 Thread Brian
Thomas,

The TimeValue class handles precisely defined time periods (well, pretty 
much, anyway). In other words, 1s is one second. 1w is always 7d (leap 
seconds notwithstanding, but that doesn't really affect the precision).

But what is one year? 365 days? 365.25 days? 366 days in a leap year?

What is one quarter? Exactly 91.25d (which is 365 / 4)? Or 3 months? 

But then, what is a month? 28 days? 31 days? Use 28d or 31d if that's what 
you mean; 1 month has no deterministic meaning all by itself. And 1 quarter 
is 3 months but without any deterministic way to convert to a precise 
number of milliseconds.

The TimeValue class has no support for locale nor day of year nor leap year 
nor days in a month. It's best to use Joda time if you wish to perform 
proper year-oriented calculations. And it will return milliseconds 
precision if you wish, which will plug directly back into a TimeValue.

Brian

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/05798106-2a3d-4b7a-8a06-572116e0694b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Filtering on Script Value

2014-06-24 Thread Brian Behling
I'm trying to calculate a value for each hit then select or filter on a 
calculated value. Something like below: 

query: {
match_all: {}
},
script_fields : {
counter : {
script : count++,
params : {
count  : 1
}
},
source : {
  script : _source
}
  }

I'd like to filter on the count parameter.

I've read on a StackOverflow post that you cannot filter on a script value.

So is there another way to calculate some value dynamically and filter on 
that value? 

If not, is there a nested SQL SELECT equivalent in ElasticSearch? Maybe I 
could execute the first query to calculate the 'count' then execute another 
query to filter by a value?

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/8206ea84-b314-4b8e-8f3c-248d9f5a99e7%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: script_fields vs filter script

2014-06-24 Thread Brian Behling
I'm trying to filter on a calculated script field as well.

Have you figured this out Kajal?

On Tuesday, May 27, 2014 10:49:35 AM UTC-6, Kajal Patel wrote:

 Hey, 

 Can you actually post your solution If you figured out. 
 I am having similar issue, I need to filter search result based on 
 script_field. I don't want to use filter_script though because I am using 
 facets and I want my records to filter out for facets too.

 Do you know if can extends any class or any any plugin or anything to 
 filter my records based on the script field.


 On Sunday, July 7, 2013 1:21:38 PM UTC-4, Oreno wrote:

 Hi Alex,
 1.I checked the cash solution but its taking 15 times more then my 
 starting time (10s against 150s), so that will be a problem since my filter 
 has dynamic params. 
 It does go fast once it's stored though. Do you know if it's  possible to 
 do some kind of cashing for all source documents for future queries?

 2.From what I understand ,both the filter script and the script_field are 
 suppose to go over each document that results from the prior query.
 The only thing I can think of that makes the difference is that the 
 script_filter actually needs to filter the false documents (for the hit 
 count) while the script_field only
 needs to add the field for the first 10 document returning by default.

 I'm trying to figure out how I can speed the response when using source() 
 on native java script. 
 I'm assuming the bottle neck is somewhere within creating the response. I 
 read that using source has some overhead  because elasticsearch has to 
 parse the json source,
 but if that was the case here, then I should have received the same big 
 overhead for both  script_field and  filter script runs.

 All I actually need is the hit count so if I'm correct about the response 
 parsing and that can be excluded I'll be really glad.

 Any idea on the above?

 Appreciating your help.

 Oren


 On Sun, Jul 7, 2013 at 7:13 PM, Alexander Reelsen-2 [via ElasticSearch 
 Users] [hidden email] 
 http://user/SendEmail.jtp?type=nodenode=4037661i=0 wrote:

 Hey,

 what kind of query are you executing? Using script fields results in the 
 scipt only being executed for each search hit, whereas executing it as a 
 script filter it might need to execute for each document in your index (you 
 can try to cache the script filter so it might be faster for subsequent 
 requests).

 Hope this helps as a start for optimization, if not, please provide some 
 more information.


 --Alex


 On Sun, Jul 7, 2013 at 2:21 PM, oreno [hidden email] 
 http://user/SendEmail.jtp?type=nodenode=4037659i=0 wrote:

 Hi, I notice that using a script_fields that returns true or false 
 values is
 going much faster then
 using the same script but with filter script declaration (so it will 
 filter
 the docs returning false).

 I was sure that the  filter script is taking so long because I'm using 
 the
 source().get(...) method, but turns out that when using the same script,
 only with script_fields  instead, I'm receiving the performance I need. 
 the
 only problem here is that I want to filter the docs that now have
 MessageReverted = false.

 1.Any way I can filter the docs containing  MessageReverted = false 
 ?(some
 wrapper query?)
 2. Any idea way the filter script is taking much longer then the script
 field(8000 mill against 250 mill)?

 both ways are retrieving the source() for the script logic so it can't 
 be a
 matter of source fetching as far as I understand.

 fast:
 ...,
   script_fields: {
 MessageReverted: {
   script: revert,
   lang: native,
   params: {
 startDate: 2013-05-1,
 endDate: 2013-05-1,
 attributeId: 2365443,
 segmentId: 2365443
   }
 }
   }


 slow:
 ...,
   filter: {
 script: {
   script: revert,
   lang: native,
   params: {
 startDate: 2013-05-1,
 endDate: 2013-05-1,
 attributeId: 2365443,
 segmentId: 2365443
   }
 }
   }


 Any idea?

 Thanks in advanced,

 Oren



 --
 View this message in context: 
 http://elasticsearch-users.115913.n3.nabble.com/script-fields-vs-filter-script-tp4037658.html
 Sent from the ElasticSearch Users mailing list archive at Nabble.com.

 --
 You received this message because you are subscribed to the Google 
 Groups elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send 
 an email to [hidden email] 
 http://user/SendEmail.jtp?type=nodenode=4037659i=1.
 For more options, visit https://groups.google.com/groups/opt_out.



  -- 
 You received this message because you are subscribed to the Google 
 Groups elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send 
 an email to [hidden email] 
 http://user/SendEmail.jtp?type=nodenode=4037659i=2.
 For more options, visit https://groups.google.com/groups/opt_out.
  
  


 --
  If you reply to this email, your message will be 

Re: [logstash-users] Re: Kibana dashboards - A community repository

2014-06-23 Thread Brian
Thanks, Mark! That really helps a lot.

Starting with the excellent logstash book, this example 
https://git.openstack.org/cgit/openstack-infra/config/tree/modules/openstack_project/templates/logstash/indexer.conf.erb
 
also helped quite a bit. It was referenced from here 
http://ci.openstack.org/logstash.html.

Brian

On Monday, June 23, 2014 6:51:56 AM UTC-4, Mark Walkom wrote:

 I'm definitely open to expanding this.

 I am thinking it might even grow to include LS configs (eg custom grok 
 patterns), as they are an important part of the visuals.

 Regards,
 Mark Walkom



-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/5e92485a-706e-49f2-831f-8a8c2e9aaac7%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Count not working for Java API, works for REST

2014-06-23 Thread Brian
Perhaps you need to insert the execute().actionGet() method calls, as below?

CountRequestBuilder builder = client.prepareCount(indexName)
.setTypes(product)
.setQuery(getQuery(req));

*builder.execute().actionGet()*

return builder.get().getCount();


I don't use Count, but I have used Query and Update and Delete and they all 
work similarly in this regard. Just a guess.

Brian

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/efaf68b8-90f1-47c7-87c1-d124ae7c4bce%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: logstash CPU usage

2014-06-23 Thread Brian
Of the various logstash groups, the following is the one that I have found 
to be the most active and helpful: 
https://groups.google.com/forum/#!forum/logstash-users

Brian

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/582c-7e12-4ba4-9847-e9976313c924%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Getting complete value from ElasticSearch query

2014-06-23 Thread Brian
Vinay,

To be more specific: 

If you don't ask for any fields, then _source is returned by default.

But if you ask for any fields at all, then _source is not included by 
default. Therefore, if you wish to include _source along with other fields, 
you must explicitly ask for _source along with those other fields.

Brian

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/b6907975-ce05-4a9a-88cc-cf234e0c1990%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Bulk API possible bug

2014-06-23 Thread Brian
Hi, Pablo.

I remember reading that Elasticsearch will happily store an invalid JSON 
string as your _source.

From my usage of the Java API, I noticed that the Jackson library is used, 
but that only the stream parser is present. What this tells me is that ES 
is likely parsing your JSON token-by-token and has processed and indexed 
most of it. In other words, an error isn't an all-or-nothing situation. 
Since your syntax error happens at the very end of the document, 
Elasticsearch has indexed all of the document before it encounters the 
error.

My guess is that if the error was not at the very end of the document, then 
Elasticsearch would fail to process and index any information past the 
error, but would successfully process and index information (if any) before 
the error.

Brian

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/042fcbfd-9575-4543-b6b1-2328af05b1fe%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Proper parsing of String values like 1m, 1q HOUR etc.

2014-06-23 Thread Brian
1w means one week, 12.3d means 12.3 days, 52w means 52 weeks, 4h means 4 
hours, 12.3ms means 12.3 milliseconds, 12 means 12 milliseconds but without 
the suffix the value must be an integer.

In other words, TimeValue supports the parsing of a String that contains a 
long integer digit string to mean milliseconds, or an integer or floating 
point digit string with a suffix. So a WEEK is represented as 1w or 7d, 
and an HOUR is represented as 1h or 60m.

So if you want to support your own vocabulary, then create a wrapper class 
that converts your own terms to TimeValue strings, then then passes them 
into the TimeValue class.

Brian

On Tuesday, June 17, 2014 11:31:37 AM UTC-4, Thomas wrote:

 Hi,

 I was wondering whether there is a proper Utility class to parse the given 
 values and get the duration in milliseconds probably for values such as 1m 
 (which means 1 minute) 1q (which means 1 quarter) etc.

 I have found that elasticsearch utilizes class TimeValue but it only 
 parses up to week, and values such as WEEK, HOUR are not accepted. So is in 
 elasticsearch source any utility class that does the job ? (for Histograms, 
 ranges wherever is needed)

 Thank you
 Thomas



-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/3078016f-fe47-468b-a36f-c19f2a5c607d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: issues with file input from logstash to elastic - please read

2014-06-22 Thread Brian
Thanks so much for the feedback, Ivan.

One more question: We have two different forms of rotated files (on *IX 
systems; no Windows servers):
1. Standard log4j rotation: The XXX.log file is renamed to XXX-date.log 
and a new XXX.log file is created. The name doesn't change, but the inode 
changes.
2. When we switched many of our applications to use log4j2, we don't rotate 
the log files using log4j2. Instead, we have a cron job that, once per 
hour, makes a copy of the XXX.log file and then truncates the XXX.log file; 
in the background it compresses the copy. In this case, the name doesn't 
change, the inode doesn't change, but the size suddenly drops to 0 before 
it starts filling again from the beginning.

The GNU tail -F command handles both of these equally perfectly. Does 
logstash also handle both of these cases? Thanks in advance!

P.S. I am not a logstash expert either, but it's been a lot of fun to 
rediscover Elasticsearch from the ELK perspective (auto-mapping, 
auto-creation of indices, and so on).

Brian

On Saturday, June 21, 2014 10:42:37 AM UTC-4, Ivan Brusic wrote:

 The path shows an windows file name, so I am not sure if using tail would 
 work. On cygwin, there is no -F option, at least on the version I use. On 
 Linux, the file input works great, especially with rotated file. 

 I am not a Logstash expert, but I use the file input with the sincedb 
 option (sincedb_path) and it has worked since day one.

 -- 
 Ivan


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/9f1433e1-748e-4a20-980f-5112a1f965fa%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Splunk vs. Elastic search performance?

2014-06-20 Thread Brian
Thomas,

Thanks for your insights and experiences. As I am someone who has explored 
and used ES for over a year but is relatively new to the ELK stack, your 
data points are extremely valuable. Let me offer some of my own views.

Re: double the storage. I strongly recommend ELK users to disable the _all 
field. The entire text of the log events generated by logstash ends up in 
the message field (and not @message as many people incorrectly post). So 
the _all field is just redundant overhead with no value add. The result is 
a dramatic drop in database file sizes and dramatic increase in load 
performance. Of course, you need to configure ES to use the message field 
as the default for a Lucene Kibana query.

During the year that I've used ES and watched this group, I have been on 
the front line of a brand new product with a smart and dedicated 
development team working steadily to improve the product. Six months ago, 
the ELK stack eluded me and reports weren't encouraging (with the sole 
exception of the Kibana web site's marketing pitch). But ES has come a long 
way since six months ago, and the ELK stack is much more closely integrated.

The Splunk UI is carefully crafted to isolate users from each other and 
prevent external (to the Splunk db itself, not to our company) users from 
causing harm to data. But Kibana seems to be meant for a small cadre of 
trusted users. What if I write a dashboard with the same name as someone 
else's? Kibana doesn't even begin to discuss user isolation. But I am 
confident that it will.

How can I tell Kibana to set the default Lucene query operator to AND 
instead of OR. Google is not my friend: I keep getting references to the 
Ruby versions of Kibana; that's ancient history by now. Kibana is cool and 
promising, but it has a long way to go for deployment to all of the folks 
in our company who currently have access to Splunk.

Logstash has a nice book that's been very helpful, and logstash itself has 
been an excellent tool for prototyping. The book has been invaluable in 
helping me extract dates from log events and handling all of our different 
multiline events. But it still doesn't explain why the date filter needs a 
different array of matching strings to get the date that the grok filter 
has already matched and isolated. And recommendations to avoid the 
elasticsearch_http output and use elasticsearch (via the Node client) 
directly contradict the fact that logstash's 1.1.1 version of the ES client 
library is not compatible with the most recent 1.2.1 version of ES.

And logstash is also a resource hog, so we eventually plan to replace it 
with Perl and Apache Flume (already in use) and pipe it into my Java bulk 
load tool (which is always kept up-to-date with the versions of ES we 
deploy!!). Because we send the data via Flume to our data warehouse, any 
losses in ES will be annoying but won't be catastrophic. And the front-end 
following of rotated log files will be done using the GNU *tail -F* command 
and option. This GNU tail command with its uppercase -F option follows 
rotated log files perfectly. I doubt that logstash can do the same, and we 
currently see that neither can Splunk (so we sporadically lose log events 
in Splunk too). So GNU tail -F piped into logstash with the stdin filter 
works perfectly in my evaluation setup and will likely form the first stage 
of any log forwarder we end up deploying,

Brian

On Thursday, June 19, 2014 8:48:34 AM UTC-4, Thomas Paulsen wrote:

 We had a 2,2TB/d installation of Splunk and ran it on VMWare with 12 
 Indexer and 2 Searchheads. Each indexer had 1000IOPS guaranteed assigned. 
 The system is slow but ok to use. 

 We tried Elasticsearch and we were able to get the same performance with 
 the same amount of machines. Unfortunately with Elasticsearch you need 
 almost double amount of storage, plus a LOT of patience to make is run. It 
 took us six months to set it up properly, and even now, the system is quite 
 buggy and instable and from time to time we loose data with Elasticsearch. 

 I don´t recommend ELK for a critical production system, for just dev work, 
 it is ok, if you don´t mind the hassle of setting up and operating it. The 
 costs you save by not buying a splunk license you have to invest into 
 consultants to get it up and running. Our dev teams hate Elasticsearch and 
 prefer Splunk.


On Thursday, June 19, 2014 8:48:34 AM UTC-4, Thomas Paulsen wrote:

 We had a 2,2TB/d installation of Splunk and ran it on VMWare with 12 
 Indexer and 2 Searchheads. Each indexer had 1000IOPS guaranteed assigned. 
 The system is slow but ok to use. 

 We tried Elasticsearch and we were able to get the same performance with 
 the same amount of machines. Unfortunately with Elasticsearch you need 
 almost double amount of storage, plus a LOT of patience to make is run. It 
 took us six months to set it up properly, and even now, the system is quite 
 buggy and instable and from time to time we loose data

Re: get rid of _all to optimize storage and perfs (Re: Splunk vs. Elastic search performance?)

2014-06-20 Thread Brian
Patrick,

Here's my template, along with where the _all field is disabled. You may 
wish to add this setting to your own template, and then also add the index 
setting to ignore malformed data (if someone's log entry occasionally slips 
in null or no-data instead of the usual numeric value):

{
  automap : {
template : logstash-*,
settings : {
  *index.mapping.ignore_malformed : true*
},
mappings : {
  _default_ : {
numeric_detection : true,
*_all : { enabled : false },*
properties : {
  message : { type : string },
  host : { type : string },
  UUID : {  type : string, index : not_analyzed },
  logdate : {  type : string, index : no }
}
  }
}
  }
}

Brian

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/a145cb1e-4013-4a6b-a58d-9a42368d8107%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Splunk vs. Elastic search performance?

2014-06-20 Thread Brian
Mark,

I've read one post (can't remember where) that the Node client was 
preferred, but have also read where the HTTP interface is minimal overhead. 
So yes, I am currently using logstash with the HTTP interface and it works 
fine.

I also performed some experiments with clustering (not much, due to 
resource and time constraints) and used unicast discovery. Then I read 
someone who strongly recommended multicast recovery, and I started to feel 
like I'd gone down the wrong path. Then I watched the ELK webinar and heard 
that unicast discovery was preferred. I think it's not a big deal either 
way; it's what works best for your particular networking infrastructure.

In addition, I was recently given this link: 
http://aphyr.com/posts/317-call-me-maybe-elasticsearch. It hasn't dissuaded 
me at all, but it is a thought-provoking read. I am a little confused by 
some things, though. In all of my high-performance banging on ES, even with 
my time-to-live test feature enabled, I never lost any documents at all. 
But I wasn't using auto-id; I was specifying my own unique ID. And when run 
in my 3-node cluster (slow due to being hosted by 3 VMs running on a 
dual-code machine), I still didn't lose any data. So I am not sure of the 
high data loss scenarios he describes in his missive; I have seen no 
evidence of any data loss due to false insert positives at all.

Brian

On Friday, June 20, 2014 6:30:27 PM UTC-4, Mark Walkom wrote:

 I wasn't aware that the elasticsearch_http output wasn't recommended?
 When I spoke to a few of the ELK devs a few months ago, they indicated 
 that there was minimal performance difference, at the greater benefit of 
 not being locked to specific LS+ES versioning.

 Regards,
 Mark Walkom



-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/f7621a17-9366-4166-9612-61415938013f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: issues with file input from logstash to elastic - please read

2014-06-20 Thread Brian
Eitan,

My recommendation is to use the stdin input in logstash and avoid its file 
input. Then, for testing you pipe the file into your logstash instance. But 
in production, you should run the GNU version of *tail -F* (uppercase F 
option) to correctly follow all forms of rotated logs, and the pipe that 
output into your logstash instance.

I don't know just how robust logstash's file input is, but the GNU version 
of tail with the -F option is perfect, so there's no guesswork and no 
dependency on hope. Note that even Splunk has a currently open bug with 
losing data while trying to follow a rotated file.

Also, I added the multiline processing to the filters; it didn't seem to 
work when applied as a stdin codec. Now it works very well together.

Anyway, that's what our group is doing.

And yes, the logstash-users 
https://groups.google.com/forum/#!forum/logstash-users group is also 
rather active and is a good place for logstash-specific help.

Brian

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/9bbe59f4-93f1-4b59-8258-89301a8c5469%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Index template requires settings object even if its value is empty

2014-06-17 Thread Brian
By the way, I got a little ahead of myself in the previous post. In 
particular:

settings : {
  index.mapping.ignore_malformed : true*,*
  *index.query.default_field : message*
},

Apparently, when added the setting above in red, and then removed the 
following option from my ES 1.2.1 start-up script, Kibana was no longer 
able to search on HTTP and it required message:HTTP because the _all field 
has also been disabled:

-Des.index.query.default_field=message

So I put the configuration option (above) back into my ES start-up script, 
and removed the index configuration option in red above (as it didn't seem 
to work). Not sure if this is a problem with my understanding (most likely) 
or a bug in ES (very unlikely). But I offer it to the experts for comment 
and correction.

But however it should be, ES rocks and I've managed to get several people 
up and running with a one-button (as it were) build, install, load, and 
test. Awesome job, Elasticsearch.com! You make me look good!

Brian

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/d68e3db5-e651-4e57-85b8-fea70a5e8de9%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Index template requires settings object even if its value is empty

2014-06-16 Thread Brian
Alex,

I am running ES version 1.2.1.

It seemed to work (no errors in the logs), but I did it as an on-disk 
template and not via PUT. And without the settings, it behaved as if it 
wasn't there.

The question is now moot, because I actually need the following setting:

settings : {
  index.mapping.ignore_malformed : true,
  index.query.default_field : message
},

I don't have a problem fiddling with local files; Elasticsearch, the 
wrapper script, and everything else I need is stored in a single zip 
archive that our operations team can easily install. So once I install it 
on my laptop and verify that it's working, it's 100.% repeatable when 
installed on any QA or production server.

I also configure logstash's elasticsearch_http as follows:

manage_template = false

That way, I don't have to depend on logstash (or anything else) doing that 
for me. It's already done by the base ES install package.

Brian


On Monday, June 16, 2014 8:03:33 AM UTC-4, Alexander Reelsen wrote:

 Hey,

 which ES version are you using? Seems to work with the latest version. You 
 can also use the index template API, so you do not have to fiddle with 
 local files (and copy them when adding new nodes).

 PUT _template/automap
 {
   template: *,
   mappings: {
 _default_: {
   numeric_detection: true,
   properties: {
 message: {
   type: string
 },
 host: {
   type: string
 },
 @version: {
   type: string
 }
   }
 }
   }
 }



 --Alex


 On Tue, Jun 3, 2014 at 5:57 PM, Brian brian@gmail.com javascript: 
 wrote:

 I am not sure if this is a problem or if it's OK.

 Working with the ELK stack I have switched direction, and instead of 
 locking down the Elasticsearch mappings I am now using its automatic 
 mapping functions. And by adding the following JSON template definition to 
 the /*path.to.config*/templates/automap.json file I can get numeric 
 fields automatically correctly mapped even though logstash always emits 
 their values as strings (45.6 instead of 45.6). Very nice!

 {
   automap : {
 template : *,
 *settings : { },*
 mappings : {
   _default_ : {
 numeric_detection : true,
 properties : {
   message : {type : string},
   host : {type : string},
   @version : {type : string}
 }
   }
 }
   }
 }

 When I removed the *settings:{}* entirely, it was as if the template 
 did not exist; the numeric detection was not enabled and all string values 
 were seen as strings even if they contained numbers. Because all of the 
 settings are being controlled within elasticsearch.yml and not the template 
 (e.g. number of shards, number of replicas, and so on), eliminating the 
 settings from the template is desired, even if I have to leave it in but 
 set its value to the empty JSON object.

 If this is the way it's supposed to work, that's OK. But I couldn't find 
 anything in the documentation about it, and just wanted to get a 
 verification either way.

 Thanks!

 Brian

 -- 
 You received this message because you are subscribed to the Google Groups 
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an 
 email to elasticsearc...@googlegroups.com javascript:.
 To view this discussion on the web visit 
 https://groups.google.com/d/msgid/elasticsearch/ff4afb8e-c3e4-4772-aa48-bd6a651c78e8%40googlegroups.com
  
 https://groups.google.com/d/msgid/elasticsearch/ff4afb8e-c3e4-4772-aa48-bd6a651c78e8%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.




-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/0ffa60d5-92a1-462f-b335-de83907060eb%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


  1   2   >