Re: Flink + Druid example?

2017-04-10 Thread dromitlabs
Thank you for the information, I'll have a look.

> On Apr 10, 2017, at 06:02, Steven Le Roux  wrote:
> 
> Hi,
> 
> I'm head of @OvhMetrics which is a Cloud scaled managed time series platform 
> targetting IoT and Monitoring.
> 
> We're also using @warp10io components with some glue and optimisations. The 
> storage layer is based on Apache HBase which is to me an ideal compromise 
> between storage efficiency (bytes per data point, compression, no indexing), 
> and performance (range scan capacities, custom filters, ...)
> 
> This allows us to use two paradigm to produce data : either you use the HTTP 
> endpoint, either MR targetting directly HBase since Warp10 has strong hadoop 
> integration.
> 
> Advantages of Warp10 vs Influx : 
>   - Warp10 is fully open source, influx is not (clustering not available as 
> OSS)
>   - Influx is good at ingestion but it needs your data to come in order. Real 
> time use cases show that data points don't arrive in order (some are 
> retained, buffering make older point to arrive after newest, etc...)
>   - Warp10 has been measured at 1.8M data points/s per thread! (and not in an 
> optimised case)
>   - The true power of Warp10 is WarpScript: its query language that adopts a 
> data flow approach and has been designed for Time series from ground up. Our 
> customers are doing truely amazing things with WarpScript that contains 
> nearly 800 functions...  It brings analytics and signal processing over your 
> time series data
>   - Warp10 can be deployed either standalone (in-mem or leveldb) or 
> distributed mode (hbase)
>   - Security is mandatory and does not affect performance
>   - you can delete massive amounts of data range or just a single point 
> easily.
> 
> 
> Matt, if you want few metrics of our use of Warp10 inside OVH :
>   - 450M of unique series
>   - nominal load of 1.5M datapoints/s
>   - we have a delete rate of 10M data points/s
> 
> 
> If you have more interest in Warp10, you can ask there :  
> https://groups.google.com/forum/#!forum/warp10-users
> 
> 
> Regards,
> 
> 
> 
>> On Mon, Apr 10, 2017 at 10:26 AM, Alexis Gendronneau 
>>  wrote:
>> hi,
>> 
>> Did you know http://www.warp10.io/ ? It's a geotimeserie database. As far as 
>> i know this techno can handle 100k+  points per node ingestion, and its 
>> query language is powerful. I already tried it to process timeseries 
>> correlation. I'm pretty sure you wont be disappionted by it. 
>> 
>> Regards,
>> 
>> 2017-04-09 17:07 GMT+02:00 Matt :
>>> I just noticed the first link is wrong, I intended to send [1] instead.
>>> 
>>> On a second look at InfluxDB, the compression is really better than Druid, 
>>> same for write and read performance. I'll have a deeper look before 
>>> committing to one.
>>> 
>>> [1] 
>>> https://cdn2.hubspot.net/hub/528953/hubfs/Screen_Shot_2016-08-27_at_00.32.42.png?t=1491606817725
>>> 
>>> On Sat, Apr 8, 2017 at 9:40 PM, Matt  wrote:
 I compared them some days ago.
 
 I found a useful article about many of the tsdb available out there [1], 
 check the big table on the article, it's really helpful. The thing that 
 bothered me the most about InfluxDB was not being able to setup a cluster 
 using the open source distribution, that may not be a problem in the 
 future but I preferred to be able to do so now.
 
 Regarding Druid there is also a really interesting talk by one of its 
 committers [2]. I liked some of the decisions they made regarding the way 
 queries are executed and the way the data is stored on disk (they have 
 taken some ideas from the search engine industry).
 
 The other promising alternative is Prometheus, though I haven't had a look 
 at it yet, I plan to do so in the near future.
 
 If anyone is using a time-series database and wants to tell us about it 
 that would be helpful!
 
 Best regards,
 Matt
 
 [1] 
 https://blog.netsil.com/a-comparison-of-time-series-databases-and-netsils-use-of-druid-db805d471206
 [2] https://www.youtube.com/watch?v=vbH8E0nH2Nw
 
> On Sat, Apr 8, 2017 at 8:16 PM, Ted Yu  wrote:
> I found this related post:
> 
> https://groups.google.com/forum/#!topic/druid-user/Co5WUZOMnEk
> 
>> On Sat, Apr 8, 2017 at 3:56 PM, Traku traku  wrote:
>> I'm using Influxdb. I think influxdb is easier as time-series database 
>> solution.
>> 
>> Did you compare them?
>> 
>> Best regards.
>> 
>> 2017-04-07 21:01 GMT+02:00 Matt :
>>> Hi all,
>>> 
>>> I'm looking for an example of Tranquility (Druid's lib) as a Flink sink.
>>> 
>>> I'm trying to follow the code in [1] but I feel it's incomplete or 
>>> maybe outdated, it doesn't mention anything about other method 
>>> (tranquilizer) that seems to be part of the BeamFactory interface in 
>>> the current version.
>>> 
>>> If anyone has any code or a working project to use as a reference that 
>>>

Re: Flink + Druid example?

2017-04-10 Thread Steven Le Roux
Hi,

I'm head of @OvhMetrics which is a Cloud scaled managed time series
platform targetting IoT and Monitoring.

We're also using @warp10io components with some glue and optimisations. The
storage layer is based on Apache HBase which is to me an ideal compromise
between storage efficiency (bytes per data point, compression, no
indexing), and performance (range scan capacities, custom filters, ...)

This allows us to use two paradigm to produce data : either you use the
HTTP endpoint, either MR targetting directly HBase since Warp10 has strong
hadoop integration.

Advantages of Warp10 vs Influx :
  - Warp10 is fully open source, influx is not (clustering not available as
OSS)
  - Influx is good at ingestion but it needs your data to come in order.
Real time use cases show that data points don't arrive in order (some are
retained, buffering make older point to arrive after newest, etc...)
  - Warp10 has been measured at 1.8M data points/s per thread! (and not in
an optimised case)
  - The true power of Warp10 is WarpScript: its query language that adopts
a data flow approach and has been designed for Time series from ground up.
Our customers are doing truely amazing things with WarpScript that contains
nearly 800 functions...  It brings analytics and signal processing over
your time series data
  - Warp10 can be deployed either standalone (in-mem or leveldb) or
distributed mode (hbase)
  - Security is mandatory and does not affect performance
  - you can delete massive amounts of data range or just a single point
easily.


Matt, if you want few metrics of our use of Warp10 inside OVH :
  - 450M of unique series
  - nominal load of 1.5M datapoints/s
  - we have a delete rate of 10M data points/s


If you have more interest in Warp10, you can ask there :
https://groups.google.com/forum/#!forum/warp10-users


Regards,



On Mon, Apr 10, 2017 at 10:26 AM, Alexis Gendronneau <
a.gendronn...@gmail.com> wrote:

> hi,
>
> Did you know http://www.warp10.io/ ? It's a geotimeserie database. As far
> as i know this techno can handle 100k+  points per node ingestion, and its
> query language is powerful. I already tried it to process timeseries
> correlation. I'm pretty sure you wont be disappionted by it.
>
> Regards,
>
> 2017-04-09 17:07 GMT+02:00 Matt :
>
>> I just noticed the first link is wrong, I intended to send [1] instead.
>>
>> On a second look at InfluxDB, the compression is really better than
>> Druid, same for write and read performance. I'll have a deeper look before
>> committing to one.
>>
>> [1] https://cdn2.hubspot.net/hub/528953/hubfs/Screen_Shot_20
>> 16-08-27_at_00.32.42.png?t=1491606817725
>>
>> On Sat, Apr 8, 2017 at 9:40 PM, Matt  wrote:
>>
>>> I compared them some days ago.
>>>
>>> I found a useful article about many of the tsdb available out there [1],
>>> check the big table on the article, it's really helpful. The thing that
>>> bothered me the most about InfluxDB was not being able to setup a cluster
>>> using the open source distribution, that may not be a problem in the future
>>> but I preferred to be able to do so now.
>>>
>>> Regarding Druid there is also a really interesting talk by one of its
>>> committers [2]. I liked some of the decisions they made regarding the way
>>> queries are executed and the way the data is stored on disk (they have
>>> taken some ideas from the search engine industry).
>>>
>>> The other promising alternative is Prometheus, though I haven't had a
>>> look at it yet, I plan to do so in the near future.
>>>
>>> If anyone is using a time-series database and wants to tell us about it
>>> that would be helpful!
>>>
>>> Best regards,
>>> Matt
>>>
>>> [1] https://blog.netsil.com/a-comparison-of-time-series-data
>>> bases-and-netsils-use-of-druid-db805d471206
>>> [2] https://www.youtube.com/watch?v=vbH8E0nH2Nw
>>>
>>> On Sat, Apr 8, 2017 at 8:16 PM, Ted Yu  wrote:
>>>
 I found this related post:

 https://groups.google.com/forum/#!topic/druid-user/Co5WUZOMnEk

 On Sat, Apr 8, 2017 at 3:56 PM, Traku traku  wrote:

> I'm using Influxdb. I think influxdb is easier as time-series database
> solution.
>
> Did you compare them?
>
> Best regards.
>
> 2017-04-07 21:01 GMT+02:00 Matt :
>
>> Hi all,
>>
>> I'm looking for an example of Tranquility (Druid's lib) as a Flink
>> sink.
>>
>> I'm trying to follow the code in [1] but I feel it's incomplete or
>> maybe outdated, it doesn't mention anything about other method
>> (tranquilizer) that seems to be part of the BeamFactory interface in the
>> current version.
>>
>> If anyone has any code or a working project to use as a reference
>> that would be awesome for me and for the rest of us looking for a
>> time-series database solution!
>>
>> Best regards,
>> Matt
>>
>> [1] https://github.com/druid-io/tranquility/blob/master/docs/flink.md
>>
>
>

>>>
>>
>
>
> --
> Alexis Gendronneau
>

Re: Flink + Druid example?

2017-04-10 Thread Alexis Gendronneau
hi,

Did you know http://www.warp10.io/ ? It's a geotimeserie database. As far
as i know this techno can handle 100k+  points per node ingestion, and its
query language is powerful. I already tried it to process timeseries
correlation. I'm pretty sure you wont be disappionted by it.

Regards,

2017-04-09 17:07 GMT+02:00 Matt :

> I just noticed the first link is wrong, I intended to send [1] instead.
>
> On a second look at InfluxDB, the compression is really better than Druid,
> same for write and read performance. I'll have a deeper look before
> committing to one.
>
> [1] https://cdn2.hubspot.net/hub/528953/hubfs/Screen_Shot_
> 2016-08-27_at_00.32.42.png?t=1491606817725
>
> On Sat, Apr 8, 2017 at 9:40 PM, Matt  wrote:
>
>> I compared them some days ago.
>>
>> I found a useful article about many of the tsdb available out there [1],
>> check the big table on the article, it's really helpful. The thing that
>> bothered me the most about InfluxDB was not being able to setup a cluster
>> using the open source distribution, that may not be a problem in the future
>> but I preferred to be able to do so now.
>>
>> Regarding Druid there is also a really interesting talk by one of its
>> committers [2]. I liked some of the decisions they made regarding the way
>> queries are executed and the way the data is stored on disk (they have
>> taken some ideas from the search engine industry).
>>
>> The other promising alternative is Prometheus, though I haven't had a
>> look at it yet, I plan to do so in the near future.
>>
>> If anyone is using a time-series database and wants to tell us about it
>> that would be helpful!
>>
>> Best regards,
>> Matt
>>
>> [1] https://blog.netsil.com/a-comparison-of-time-series-data
>> bases-and-netsils-use-of-druid-db805d471206
>> [2] https://www.youtube.com/watch?v=vbH8E0nH2Nw
>>
>> On Sat, Apr 8, 2017 at 8:16 PM, Ted Yu  wrote:
>>
>>> I found this related post:
>>>
>>> https://groups.google.com/forum/#!topic/druid-user/Co5WUZOMnEk
>>>
>>> On Sat, Apr 8, 2017 at 3:56 PM, Traku traku  wrote:
>>>
 I'm using Influxdb. I think influxdb is easier as time-series database
 solution.

 Did you compare them?

 Best regards.

 2017-04-07 21:01 GMT+02:00 Matt :

> Hi all,
>
> I'm looking for an example of Tranquility (Druid's lib) as a Flink
> sink.
>
> I'm trying to follow the code in [1] but I feel it's incomplete or
> maybe outdated, it doesn't mention anything about other method
> (tranquilizer) that seems to be part of the BeamFactory interface in the
> current version.
>
> If anyone has any code or a working project to use as a reference that
> would be awesome for me and for the rest of us looking for a time-series
> database solution!
>
> Best regards,
> Matt
>
> [1] https://github.com/druid-io/tranquility/blob/master/docs/flink.md
>


>>>
>>
>


-- 
Alexis Gendronneau

alexis.gendronn...@corp.ovh.com
a.gendronn...@gmail.com


Re: Flink + Druid example?

2017-04-09 Thread Matt
I just noticed the first link is wrong, I intended to send [1] instead.

On a second look at InfluxDB, the compression is really better than Druid,
same for write and read performance. I'll have a deeper look before
committing to one.

[1]
https://cdn2.hubspot.net/hub/528953/hubfs/Screen_Shot_2016-08-27_at_00.32.42.png?t=1491606817725

On Sat, Apr 8, 2017 at 9:40 PM, Matt  wrote:

> I compared them some days ago.
>
> I found a useful article about many of the tsdb available out there [1],
> check the big table on the article, it's really helpful. The thing that
> bothered me the most about InfluxDB was not being able to setup a cluster
> using the open source distribution, that may not be a problem in the future
> but I preferred to be able to do so now.
>
> Regarding Druid there is also a really interesting talk by one of its
> committers [2]. I liked some of the decisions they made regarding the way
> queries are executed and the way the data is stored on disk (they have
> taken some ideas from the search engine industry).
>
> The other promising alternative is Prometheus, though I haven't had a look
> at it yet, I plan to do so in the near future.
>
> If anyone is using a time-series database and wants to tell us about it
> that would be helpful!
>
> Best regards,
> Matt
>
> [1] https://blog.netsil.com/a-comparison-of-time-series-data
> bases-and-netsils-use-of-druid-db805d471206
> [2] https://www.youtube.com/watch?v=vbH8E0nH2Nw
>
> On Sat, Apr 8, 2017 at 8:16 PM, Ted Yu  wrote:
>
>> I found this related post:
>>
>> https://groups.google.com/forum/#!topic/druid-user/Co5WUZOMnEk
>>
>> On Sat, Apr 8, 2017 at 3:56 PM, Traku traku  wrote:
>>
>>> I'm using Influxdb. I think influxdb is easier as time-series database
>>> solution.
>>>
>>> Did you compare them?
>>>
>>> Best regards.
>>>
>>> 2017-04-07 21:01 GMT+02:00 Matt :
>>>
 Hi all,

 I'm looking for an example of Tranquility (Druid's lib) as a Flink sink.

 I'm trying to follow the code in [1] but I feel it's incomplete or
 maybe outdated, it doesn't mention anything about other method
 (tranquilizer) that seems to be part of the BeamFactory interface in the
 current version.

 If anyone has any code or a working project to use as a reference that
 would be awesome for me and for the rest of us looking for a time-series
 database solution!

 Best regards,
 Matt

 [1] https://github.com/druid-io/tranquility/blob/master/docs/flink.md

>>>
>>>
>>
>


Re: Flink + Druid example?

2017-04-08 Thread Matt
I compared them some days ago.

I found a useful article about many of the tsdb available out there [1],
check the big table on the article, it's really helpful. The thing that
bothered me the most about InfluxDB was not being able to setup a cluster
using the open source distribution, that may not be a problem in the future
but I preferred to be able to do so now.

Regarding Druid there is also a really interesting talk by one of its
committers [2]. I liked some of the decisions they made regarding the way
queries are executed and the way the data is stored on disk (they have
taken some ideas from the search engine industry).

The other promising alternative is Prometheus, though I haven't had a look
at it yet, I plan to do so in the near future.

If anyone is using a time-series database and wants to tell us about it
that would be helpful!

Best regards,
Matt

[1] https://blog.netsil.com/a-comparison-of-time-series-
databases-and-netsils-use-of-druid-db805d471206
[2] https://www.youtube.com/watch?v=vbH8E0nH2Nw

On Sat, Apr 8, 2017 at 8:16 PM, Ted Yu  wrote:

> I found this related post:
>
> https://groups.google.com/forum/#!topic/druid-user/Co5WUZOMnEk
>
> On Sat, Apr 8, 2017 at 3:56 PM, Traku traku  wrote:
>
>> I'm using Influxdb. I think influxdb is easier as time-series database
>> solution.
>>
>> Did you compare them?
>>
>> Best regards.
>>
>> 2017-04-07 21:01 GMT+02:00 Matt :
>>
>>> Hi all,
>>>
>>> I'm looking for an example of Tranquility (Druid's lib) as a Flink sink.
>>>
>>> I'm trying to follow the code in [1] but I feel it's incomplete or maybe
>>> outdated, it doesn't mention anything about other method (tranquilizer)
>>> that seems to be part of the BeamFactory interface in the current version.
>>>
>>> If anyone has any code or a working project to use as a reference that
>>> would be awesome for me and for the rest of us looking for a time-series
>>> database solution!
>>>
>>> Best regards,
>>> Matt
>>>
>>> [1] https://github.com/druid-io/tranquility/blob/master/docs/flink.md
>>>
>>
>>
>


Re: Flink + Druid example?

2017-04-08 Thread Ted Yu
I found this related post:

https://groups.google.com/forum/#!topic/druid-user/Co5WUZOMnEk

On Sat, Apr 8, 2017 at 3:56 PM, Traku traku  wrote:

> I'm using Influxdb. I think influxdb is easier as time-series database
> solution.
>
> Did you compare them?
>
> Best regards.
>
> 2017-04-07 21:01 GMT+02:00 Matt :
>
>> Hi all,
>>
>> I'm looking for an example of Tranquility (Druid's lib) as a Flink sink.
>>
>> I'm trying to follow the code in [1] but I feel it's incomplete or maybe
>> outdated, it doesn't mention anything about other method (tranquilizer)
>> that seems to be part of the BeamFactory interface in the current version.
>>
>> If anyone has any code or a working project to use as a reference that
>> would be awesome for me and for the rest of us looking for a time-series
>> database solution!
>>
>> Best regards,
>> Matt
>>
>> [1] https://github.com/druid-io/tranquility/blob/master/docs/flink.md
>>
>
>


Re: Flink + Druid example?

2017-04-08 Thread Traku traku
I'm using Influxdb. I think influxdb is easier as time-series database
solution.

Did you compare them?

Best regards.

2017-04-07 21:01 GMT+02:00 Matt :

> Hi all,
>
> I'm looking for an example of Tranquility (Druid's lib) as a Flink sink.
>
> I'm trying to follow the code in [1] but I feel it's incomplete or maybe
> outdated, it doesn't mention anything about other method (tranquilizer)
> that seems to be part of the BeamFactory interface in the current version.
>
> If anyone has any code or a working project to use as a reference that
> would be awesome for me and for the rest of us looking for a time-series
> database solution!
>
> Best regards,
> Matt
>
> [1] https://github.com/druid-io/tranquility/blob/master/docs/flink.md
>


Flink + Druid example?

2017-04-07 Thread Matt
Hi all,

I'm looking for an example of Tranquility (Druid's lib) as a Flink sink.

I'm trying to follow the code in [1] but I feel it's incomplete or maybe
outdated, it doesn't mention anything about other method (tranquilizer)
that seems to be part of the BeamFactory interface in the current version.

If anyone has any code or a working project to use as a reference that
would be awesome for me and for the rest of us looking for a time-series
database solution!

Best regards,
Matt

[1] https://github.com/druid-io/tranquility/blob/master/docs/flink.md