Re: Comparing arrays

2014-09-24 Thread Tim Uckun


On Tuesday, September 23, 2014 2:18:35 PM UTC+12, Tim Uckun wrote:
>
> I have documents with an array field. The array contains unique elements 
> only. In this case the data is strings and but it could be numbers.  
>
> I want to search for documents using this array field. Ideally I would 
> like to pass in an array and find the "nearest match" meaning documents 
> which contain all or most of the elements in the array I passed in.  
>
> For example "find me all the documents with the array field similar to 
> ['this', 'that', 'the other thing']"
>
>
Hey guys I hate to bump myself up but does anybody have any hints on how 
this could be done? 

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/0c705855-5684-45c5-902b-e310178f8e1d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Comparing arrays

2014-09-22 Thread Tim Uckun
I have documents with an array field. The array contains unique elements
only. In this case the data is strings and but it could be numbers.

I want to search for documents using this array field. Ideally I would like
to pass in an array and find the "nearest match" meaning documents which
contain all or most of the elements in the array I passed in.

For example "find me all the documents with the array field similar to
['this', 'that', 'the other thing']"

Thanks.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAGuHJrM3oasfU1tX-_VDPB6seDnzwYP32XKeQCKuySGNgLitKg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Cassandra + Elasticsearch or Just Elasticsearch for Primary data store.

2014-07-14 Thread Tim Uckun

>
>
> I'm just confused if Cassandra can really make a difference here, since 
> looks to me ES can suffice here.
>
>
>
If you are not going to be using Cassandra for indexing then there is no 
reason to have it. If you want durability in case something goes wrong with 
ES you can just store your data in a log file before pumping it into ES. 
 If for whatever reason something happens to your ES cluster you can 
reconstruct it using the log files.

 

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/eae8e382-be88-443d-88af-8beb46ab64f6%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Reverse River?

2014-05-22 Thread Tim Uckun
I would like to have a river in reverse. Every time a document is inserted 
or modified I would like to push that into another destination like a 
database.  Ideally this would be async or maybe even in batches.

Has anybody done anything like this before?

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/45b10d5a-1981-46a5-bb4a-a7ce9273ba7f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Capacity Planning with ElasticSearch

2014-04-23 Thread Tim Uckun
To follow up on this...

As a general rule is it better to have one horse size index or a hundred 
duck sized indices. I am thinking about those types of searches where you 
might frequently search a subset of the data. For example keeping a 
separate index for every customer because normally the app restricts itself 
to only dealing with one customer at a time.   Perhaps doing a compound 
split based on customer and year if your searches rarely go outside of the 
current year.

Thanks.





On Wednesday, April 23, 2014 4:07:58 PM UTC+12, Mark Walkom wrote:
>
> 1 and 2 - It'd probably be easiest to try this yourself :)
> 3 - not really, you should look into routing.
> 4 - only the index metadata is stored in memory. However doing 
> aggregations will pull the applicable data into memory.
> 5 - not sure.
>
> Regards,
> Mark Walkom
>
> Infrastructure Engineer
> Campaign Monitor
> email: ma...@campaignmonitor.com 
> web: www.campaignmonitor.com
>
>
> On 23 April 2014 13:04, Mohannad Saeed >wrote:
>
>> Any experienced dude with ES to answer this??
>>
>> --
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to elasticsearc...@googlegroups.com .
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/3e238739-2928-4aed-9883-094452bc6c11%40googlegroups.com
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/28d1e12a-a381-4921-b13e-83640767e281%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


cardinality not giving expected results.

2014-04-14 Thread Tim Uckun
I have roughly 50 million records in ES.  The data was generated 
artificially and is mostly duplicates.  I am executing the following query

{
size: 0,
aggregations: {
by_month: {

date_histogram: {
field:"time_stamp",
interval: "1M",
format:   "-MM-dd HH:mm"

},
aggregations:   {
by_node_mac: {
terms:{
field: "node_mac"
},
aggregations: {
cardinality: {field: 'device_mac'}
}

}
}
}
}
}

I expect the cardinality of these to be in the hundreds but I am getting 
hundreds of thousands an even millions as a count.

It looks like it's not counting the cardinality but actually counting the 
number of records.


-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/68eaee35-1069-4b25-99d4-7719df7d13f6%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: sbuqueries and distinct counts.

2014-04-10 Thread Tim Uckun
I hate to bump myself but  does anybody have any input on this at all?

On Thursday, April 10, 2014 11:59:43 AM UTC+12, Tim Uckun wrote:
>
> I want to do something like this.
>
> select date_trunc('month', time_stamp), sum(distinct_count) from (
>  select date_trunc('week', time_stamp) as time_stamp, count(distinct 
> field_name) as distinct_count
>   from blah
>   group by date_trun('week', time_stamp)
>   )
> group by date_trunc('month', time_stamp)
>
> So basically I want to break up the data into weekly chunks and count the 
> distinct appearances of a value and then sum those up on a per monthly 
> basis. 
>
> In preparation for that I tried to do the subquery for that  which looks 
> like this
>
>   aggregations: {
>   by_month: {
>   date_histogram: {
>   field:"time_stamp",
>   interval: "1M",
>   format:   "-MM-dd HH:mm"
>   },
>   aggregations:   {
>   by_node_mac: {
>   terms:{
>   field: "node_mac"
>   },
>   aggregations: {
>   cardinality: {field: 'device_mac'}
>   }
>
>   }
>   }
>   }
>   }
> }
>
> but I seem to be getting the wrong answers.  I am using fake data which 
> should give me very low numbers for the cardinality but it actually seems 
> to be counting the number of rows not the number of distinct items. The 
> numbers are outrageously high.
>
> I tried a precision threshold of 1000 and 100 but it seems to make no 
> difference.
>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/016b94f6-4b62-4009-83ec-ccef3bfb17f6%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


sbuqueries and distinct counts.

2014-04-09 Thread Tim Uckun
I want to do something like this.

select date_trunc('month', time_stamp), sum(distinct_count) from (
 select date_trunc('week', time_stamp) as time_stamp, count(distinct 
field_name) as distinct_count
  from blah
  group by date_trun('week', time_stamp)
  )
group by date_trunc('month', time_stamp)

So basically I want to break up the data into weekly chunks and count the 
distinct appearances of a value and then sum those up on a per monthly 
basis. 

In preparation for that I tried to do the subquery for that  which looks 
like this

  aggregations: {
  by_month: {
  date_histogram: {
  field:"time_stamp",
  interval: "1M",
  format:   "-MM-dd HH:mm"
  },
  aggregations:   {
  by_node_mac: {
  terms:{
  field: "node_mac"
  },
  aggregations: {
  cardinality: {field: 'device_mac'}
  }

  }
  }
  }
  }
}

but I seem to be getting the wrong answers.  I am using fake data which 
should give me very low numbers for the cardinality but it actually seems 
to be counting the number of rows not the number of distinct items. The 
numbers are outrageously high.

I tried a precision threshold of 1000 and 100 but it seems to make no 
difference.


-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/4cafbefc-bf00-49aa-9c7c-2240c4f1fd55%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Min, max dates not formatting?

2014-04-09 Thread Tim Uckun


On Wednesday, April 9, 2014 7:49:13 PM UTC+12, Alexander Reelsen wrote:
>
> Hey,
>
> can you try to use date formats instead of the named identificators like 
> date_time and see if it works? Also, can you check the exception of the 
> elasticsearch logs?
>
>
I tried the "-MM-dd .."  it gives an exception and doesn't seem to like 
the key "format"

ailed to execute [org.elasticsearch.action.search.SearchRequest@2c5fbec] 
lastShard [true]
org.elasticsearch.search.SearchParseException: [blah][0]: from[-1],size[0]: 
Parse Failure [Failed to parse source 
[{"size":0,"aggregations":{"min_date":{"min":{"field":"time_stamp","format":"HH:mm"}},"max_date":{"max":{"field":"time_stamp"]]
at 
org.elasticsearch.search.SearchService.parseSource(SearchService.java:634)
at 
org.elasticsearch.search.SearchService.createContext(SearchService.java:507)
at 
org.elasticsearch.search.SearchService.createAndPutContext(SearchService.java:480)
at 
org.elasticsearch.search.SearchService.executeQueryPhase(SearchService.java:252)
at 
org.elasticsearch.search.action.SearchServiceTransportAction.sendExecuteQuery(SearchServiceTransportAction.java:202)
at 
org.elasticsearch.action.search.type.TransportSearchQueryThenFetchAction$AsyncAction.sendExecuteFirstPhase(TransportSearchQueryThenFetchAction.java:80)
at 
org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction.performFirstPhase(TransportSearchTypeAction.java:216)
at 
org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction.performFirstPhase(TransportSearchTypeAction.java:203)
at 
org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction$2.run(TransportSearchTypeAction.java:186)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
Caused by: org.elasticsearch.search.SearchParseException: [blah][0]: 
from[-1],size[0]: Parse Failure [Unknown key for a VALUE_STRING in 
[min_date]: [format].]
at 
org.elasticsearch.search.aggregations.metrics.ValuesSourceMetricsAggregatorParser.parse(ValuesSourceMetricsAggregatorParser.java:68)
at 
org.elasticsearch.search.aggregations.AggregatorParsers.parseAggregators(AggregatorParsers.java:114)
at 
org.elasticsearch.search.aggregations.AggregatorParsers.parseAggregators(AggregatorParsers.java:77)
at 
org.elasticsearch.search.aggregations.AggregationParseElement.parse(AggregationParseElement.java:60)
at 
org.elasticsearch.search.SearchService.parseSource(SearchService.java:622)
... 11 more




>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/494f3834-477e-43ce-9ba5-62f51cfcc97b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Min, max dates not formatting?

2014-04-08 Thread Tim Uckun
I have a search like this

{
size: 0,
query:{
match_all: {}
},
aggregations: {

min_date: {min: {field: 'time_stamp', format: 'date_time'}},
max_date: {max: {field: 'time_stamp', format: "-MM-dd HH:mm"}}

}
}

This fails with a parse error. I can only seem to run this query without a
format and I get a unix timestamp. What can I do to get the timestamp
formatted properly.

Thanks.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAGuHJrN8ZFNF4aN-qSUawR0m26r3q%2BvLdSAAaYCdeZepzfnTGw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: java 8, elasticsearch, and MVEL

2014-04-07 Thread Tim Uckun
You should take a look at mirah. 



On Monday, April 7, 2014 11:12:39 AM UTC+12, kimchy wrote:
>
> We are planning to address this on Elasticsearch itself. The tricky bit is 
> the fact that we want to have a highly optimized concurrent scripting 
> engine. You can install the Rhino one which should work for now, its pretty 
> fast, and it allows for the type of execution we are after.
>
> We will report back with findings and progress.
>
> On Apr 6, 2014, at 14:29, joerg...@gmail.com  wrote:
>
> No, you are not the only one. MVEL breaks under Java 8 here. I use Java 8 
> with ES without scripting right now. For doc boosting, I will need 
> scripting desperately.
>
> I also want to migrate away from MVEL. My favorite is Nashorn because it 
> is part of Java 8 JDK, but I'm wrestling with thread safety issues - and my 
> tests show low performance to my surprise. 
>
> So I have tried to implement some other script languages as a plugin with 
> focus on JSR 223 (dynjs, jav8, luaj) but I'm stuck in the middle of getting 
> them to run and sorting out what script language implementation give best 
> performance and smartest resource usage behavior under ES.
>
> Jörg
>
>
> On Fri, Apr 4, 2014 at 9:11 PM, Paul Sanwald 
> 
> > wrote:
>
>> it seems I'm the only one with this problem. perhaps I will migrate our 
>> scripts to javascript. I'll post back to the group with results.
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to elasticsearc...@googlegroups.com .
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/d92ffdc0-63b5-440f-86b4-fe055b709858%40googlegroups.com
>> .
>>
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>
> -- 
> You received this message because you are subscribed to the Google Groups 
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to elasticsearc...@googlegroups.com .
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/elasticsearch/CAKdsXoG2S2Oufs1Dm26-nT4QuT17H2zdZY2JWRkFSUpd%2Butomw%40mail.gmail.com
> .
> For more options, visit https://groups.google.com/d/optout.
>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/76e763bb-5dcc-40b6-a4ed-c08a0659d603%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


ES instead of Cassandra.

2014-03-23 Thread Tim Uckun
Is ES a suitable replacement for Cassandra for an analytics platform? I 
need high speed data ingestion, time series analysis, rollups and 
aggregations etc.  Cassandra is used for this kind of task often but it 
seems to me ES might be a suitable if not better replacement.

Cheers.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/7301c4b2-b254-437c-8c28-f1906eb89975%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.