Re: [DISCUSS][FEATURE] Rich Datatypes API

2019-10-30 Thread Xiangdong Huang
Hi,

> You can look at how avro handles non primitive types (they call it
LogicalTypes) here:
https://avro.apache.org/docs/1.8.1/spec.html#Logical+Types

Yes, I read some materials about LogicalTypes. It looks like a nick name of
a data type, with some new interpretation. E.g., a byte array data type can
be called as Decimal, while the interpretation relies on how user define
the precision and scale..

Using this kind of implementation is also ok. I think.

So, you'd like to provide the interface in the IoTDB layer to user (so
using SQL to operate data), or on top of the TsFile layer (so using TsFile
API to operate data)?

Best,
---
Xiangdong Huang
School of Software, Tsinghua University

 黄向东
清华大学 软件学院


Julian Feinauer  于2019年10月30日周三 下午5:59写道:

> Hi,
>
> in fact it is mostly in the MDF spec not for compression (that’s a nice
> side effect) but rather for being able to really express the (physical)
> content of a signal.
> So my initial idea was to implement it as an optional layer on top of the
> current tsfile which does the "interpretation". Because in the tsfile its
> always just a "primitive" series that is stored.
>
> So the idea would be to store some metadata (like a formula, lookup table,
> ...) on creation and use that on reading but only optionally.
> You can look at how avro handles non primitive types (they call it
> LogicalTypes) here:
> https://avro.apache.org/docs/1.8.1/spec.html#Logical+Types
> This is similar to my idea.
>
> Julian
>
> Am 29.10.19, 14:40 schrieb "Xiangdong Huang" :
>
> Hi,
>
> > Then its most efficient to store integers and a formula like a * x +
> b
> with e.g. b = 3 and a = 1/100.
> > So 3V would be stored as x = 0, 3.01V -> x = 1, ... 4.2V as x = 1200.
> > So we only store 0 to 1200 and no decimals and stuff which would be
> very
> easily compressable I thnk.
>
> Good idea! Two thumbs up for that.
>
> But for cases like the above, implementing a new encoding method is
> better
> than a new data type.
>
> e.g, create time series root.a.b.voltage with encoding =
> linear_transformation and encoding_parameter = "describe the function
> like
> y=a * x + b" and datatype = INT.
>
> "linear_transformation" is the new encoding method.
>
> Now I get two cases from the discussion, one is like Optional data,
> and the
> other is data that can be transformative.
> So, do we want to support the above two, or find a more general data
> type
> for "rich data type" (can the MDF file support some inspiration)?
>
> Best,
> ---
> Xiangdong Huang
> School of Software, Tsinghua University
>
>  黄向东
> 清华大学 软件学院
>
>
> Julian Feinauer  于2019年10月29日周二
> 下午8:26写道:
>
> > Hi Xiangdong,
> >
> > to your second question:
> > The use case ist he other way round.
> > We know that we measure e.g. a voltage between 3V and 4.2V with a
> > precision of 0.01 or something.
> > Then its most efficient to store integers and a formula like a * x +
> b
> > with e.g. b = 3 and a = 1/100.
> > So 3V would be stored as x = 0, 3.01V -> x = 1, ... 4.2V as x = 1200.
> > So we only store 0 to 1200 and no decimals and stuff which would be
> very
> > easily compressable I thnk.
> >
> > Julian
> >
> > Am 29.10.19, 07:13 schrieb "Xiangdong Huang" :
> >
> > Hi,
> >
> > > In Java we could model it as a variable Optional<> x which
> could be
> > null,
> > Optional.empty(), Optional.of(true), Optional.of(false).
> >
> > It make sense.  And, using a new data type to achieve in IoTDB
> it is
> > ok.
> >
> > > Or scale formulas like a*x+b which allows to leverage the
> precision
> > even
> > for “small” double values or even integers.
> >
> > So, are you considering a use case like: the time series value
> should
> > be
> > [1, 1, 0, 0, 1, 1, 1, 0, 0...]  but actually we get [0.99, 0.99,
> 0.01,
> > 0,
> > 1, 1, 0.999, 0, 0.01] (because of the precision of sensors)?
> > And, what values do you want to save?
> > (1)save them as 1 and 0.  Or,
> > (2)  save them as 0.99, 0.01 indeed, but using a specific query
> API to
> > return data like 1 and 0?
> >
> > My another question is, is there a general data type can support
> the
> > above
> > cases?
> >
> > Best,
> > ---
> > Xiangdong Huang
> > School of Software, Tsinghua University
> >
> >  黄向东
> > 清华大学 软件学院
> >
> >
> > Julian Feinauer  于2019年10月29日周二
> > 上午3:58写道:
> >
> > > Hi all,
> > >
> > > I wanted to discuss a possible new feature I will call Rich
> Datatypes
> > > (RDT) API in the following.
> > > I worked a lot in the automotive industry and th

Re: [jira] [Created] (IOTDB-284) Do not write empty page

2019-10-30 Thread Jialin Qiao
Hi,

I fixed this issue in this PR[1]. I put the "check value type" before 
constructing a ChunkWriter in AbstractMemtable.

[1] https://github.com/apache/incubator-iotdb/pull/491

Best,
--
Jialin Qiao
School of Software, Tsinghua University

乔嘉林
清华大学 软件学院

> -原始邮件-
> 发件人: "Jialin Qiao (Jira)" 
> 发送时间: 2019-10-30 19:41:00 (星期三)
> 收件人: dev@iotdb.apache.org
> 抄送: 
> 主题: [jira] [Created] (IOTDB-284) Do not write empty page
> 
> Jialin Qiao created IOTDB-284:
> -
> 
>  Summary: Do not write empty page
>  Key: IOTDB-284
>  URL: https://issues.apache.org/jira/browse/IOTDB-284
>  Project: Apache IoTDB
>   Issue Type: Improvement
> Reporter: Jialin Qiao
> 
> 
> SET STORAGE GROUP TO root.turbine;
> 
> CREATE TIMESERIES root.turbine.d2.s0 WITH DATATYPE=INT32, ENCODING=RLE;
> 
> insert into root.turbine.d2(timestamp,s0) values(2,25.3);
> 
> flush 
> 
>  
> 
> When receiving the insert statement, IoTDB will create a ChunkWriterImpl, but 
> the value will not be inserted because of type error.
> 
> The empty ChunkWriterImpl will be written when meeting the 'flush' command. 
> Then, the server will log an error:
> 
>  
> 
> 19:33:24.667 [pool-6-IoTDB-Flush-SubTask-ServerServiceImpl-thread-2] ERROR 
> org.apache.iotdb.tsfile.write.chunk.ChunkBuffer - Write page error, 
> [s0,INT32,RLE,{},UNCOMPRESSED], minTime:-9223372036854775808, maxTime:0
> 
>  
> 
> We can check the data type before creating the ChunkWriterImpl to avoid write 
> empty ChunkWriterImpl.
> 
> 
> 
> --
> This message was sent by Atlassian Jira
> (v8.3.4#803005)


[Question] Any constraint about upload the python client to some package managing service like pip?

2019-10-30 Thread Xiangdong Huang
Hi,

As in 0.9.0-SNAPSHOP, we begin to provide python client (generated by
Thrift, PR#444).

Thrift will generate some .py files and these files play as a client
library for python users.

When releasing IoTDB, we have two files now: iotdb-source-release.zip, and
iotdb-binary.zip.
If we can provides these python files to users directly (rather than let
users download the .thrift file and then compile it), it could be better.
(I wonder to know do we call these pythons files as a "release" or a
"distribution", as they are generated by Thrift).

So, there are two choices,

(1) When releasing IoTDB, beside providing the above two files, we can
provide a new one: iotdb-{version}-python-client.zip

(2) Upload the python files to some package managing service, like `pip`
(similar with Maven Nexus), then users can get the python package using
`pip install apache-iotdb-client`.

If we want to upload to pip, is there any constraint from Apache?

Best,
---
Xiangdong Huang
School of Software, Tsinghua University

 黄向东
清华大学 软件学院


Re: Re: Duplicate fields in EngineDataSetWithoutValueFilter.java

2019-10-30 Thread 田原
Hi,

I have completed the issue. The PR link is 
https://github.com/apache/incubator-iotdb/pull/495.


> -原始邮件-
> 发件人: "田原" 
> 发送时间: 2019-10-31 09:46:52 (星期四)
> 收件人: dev@iotdb.apache.org
> 抄送: 
> 主题: Re: Duplicate fields in EngineDataSetWithoutValueFilter.java
> 
> Hi,
> 
> I am working on this issue, I will replace the two fields with one TreeSet. 
> As we don't use multiple threads in querying now, there is no need to use 
> ConcurrentSkipListSet. But Whenever there is a thread safe need, we only need 
> to simply switch it to ConcurrentSkipListSet.
> 
> 
> > -原始邮件-
> > 发件人: "Yuan Tian (Jira)" 
> > 发送时间: 2019-10-31 09:41:00 (星期四)
> > 收件人: dev@iotdb.apache.org
> > 抄送: 
> > 主题: [jira] [Created] (IOTDB-285) Duplicate fields in 
> > EngineDataSetWithoutValueFilter.java
> > 
> > Yuan Tian created IOTDB-285:
> > ---
> > 
> >  Summary: Duplicate fields in 
> > EngineDataSetWithoutValueFilter.java
> >  Key: IOTDB-285
> >  URL: https://issues.apache.org/jira/browse/IOTDB-285
> >  Project: Apache IoTDB
> >   Issue Type: Improvement
> > Reporter: Yuan Tian
> > 
> > 
> > There two fields in EngineDataSetWithoutValueFilter.java used to fetch the 
> > minimum time.
> > 
> > {code:java}
> > // Some comments here
> > private PriorityQueue timeHeap;
> > private Set timeSet;
> > {code}
> > the Set is used to keep heap from storing duplicate time.
> > 
> > However, a TreeSet field can do both things. No duplicate time and ensure 
> > the time order. There is no need to use these two. 
> > Especially, when we want to change to multiThread version, to keep the 
> > timeHeapPut thread safe, we have to add a synchronized onto the method, 
> > like this:
> > 
> > {code:java}
> > private synchronized void timeHeapPut(long time) {
> >if (!timeSet.contains(time)) {
> >  timeSet.add(time);
> >  timeHeap.add(time);
> >}
> >  }
> > {code}
> > 
> > But, if we only use TreeSet, we can simply use the corresponding version, 
> > ConcurrentSkipListSet, to replace it.
> > 
> > 
> > 
> > 
> > 
> > --
> > This message was sent by Atlassian Jira
> > (v8.3.4#803005)


Re: Duplicate fields in EngineDataSetWithoutValueFilter.java

2019-10-30 Thread 田原
Hi,

I am working on this issue, I will replace the two fields with one TreeSet. As 
we don't use multiple threads in querying now, there is no need to use 
ConcurrentSkipListSet. But Whenever there is a thread safe need, we only need 
to simply switch it to ConcurrentSkipListSet.


> -原始邮件-
> 发件人: "Yuan Tian (Jira)" 
> 发送时间: 2019-10-31 09:41:00 (星期四)
> 收件人: dev@iotdb.apache.org
> 抄送: 
> 主题: [jira] [Created] (IOTDB-285) Duplicate fields in 
> EngineDataSetWithoutValueFilter.java
> 
> Yuan Tian created IOTDB-285:
> ---
> 
>  Summary: Duplicate fields in EngineDataSetWithoutValueFilter.java
>  Key: IOTDB-285
>  URL: https://issues.apache.org/jira/browse/IOTDB-285
>  Project: Apache IoTDB
>   Issue Type: Improvement
> Reporter: Yuan Tian
> 
> 
> There two fields in EngineDataSetWithoutValueFilter.java used to fetch the 
> minimum time.
> 
> {code:java}
> // Some comments here
> private PriorityQueue timeHeap;
> private Set timeSet;
> {code}
> the Set is used to keep heap from storing duplicate time.
> 
> However, a TreeSet field can do both things. No duplicate time and ensure the 
> time order. There is no need to use these two. 
> Especially, when we want to change to multiThread version, to keep the 
> timeHeapPut thread safe, we have to add a synchronized onto the method, like 
> this:
> 
> {code:java}
> private synchronized void timeHeapPut(long time) {
>if (!timeSet.contains(time)) {
>  timeSet.add(time);
>  timeHeap.add(time);
>}
>  }
> {code}
> 
> But, if we only use TreeSet, we can simply use the corresponding version, 
> ConcurrentSkipListSet, to replace it.
> 
> 
> 
> 
> 
> --
> This message was sent by Atlassian Jira
> (v8.3.4#803005)


[jira] [Created] (IOTDB-285) Duplicate fields in EngineDataSetWithoutValueFilter.java

2019-10-30 Thread Yuan Tian (Jira)
Yuan Tian created IOTDB-285:
---

 Summary: Duplicate fields in EngineDataSetWithoutValueFilter.java
 Key: IOTDB-285
 URL: https://issues.apache.org/jira/browse/IOTDB-285
 Project: Apache IoTDB
  Issue Type: Improvement
Reporter: Yuan Tian


There two fields in EngineDataSetWithoutValueFilter.java used to fetch the 
minimum time.

{code:java}
// Some comments here
private PriorityQueue timeHeap;
private Set timeSet;
{code}
the Set is used to keep heap from storing duplicate time.

However, a TreeSet field can do both things. No duplicate time and ensure the 
time order. There is no need to use these two. 
Especially, when we want to change to multiThread version, to keep the 
timeHeapPut thread safe, we have to add a synchronized onto the method, like 
this:

{code:java}
private synchronized void timeHeapPut(long time) {
   if (!timeSet.contains(time)) {
 timeSet.add(time);
 timeHeap.add(time);
   }
 }
{code}

But, if we only use TreeSet, we can simply use the corresponding version, 
ConcurrentSkipListSet, to replace it.





--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: Does anyone know how to monitor the access traffic of our website

2019-10-30 Thread Kevin A. McGrail
Perfection is the barrier to progress. I see no risk to doing it and
recommend it's a very low priority issue and the PMC should proceed
as-is unless there is an actual, demonstrable risk.

On 10/30/2019 6:13 PM, Justin Mclean wrote:
> Hi Kevin,
> I don't disagree. I would assume if projects are asked to stop using it the
> Apache site would also no longer use it.
> Perhaps it's time to ask legal / board again on progress with this?
> Thanks,
> Justin
>
> On Wed, 30 Oct 2019, 22:58 Kevin A. McGrail,  wrote:
>
>> Sounds like some people who think it's good for the goose but not the
>> gander as the idiom goes.  If www.apache.org uses it, a project should
>> be ok to use it.  People are very used to using analytics and these are
>> large companies.  They are best setup to handle privacy issues like
>> GDPR.  A solution we roll ourselves is not.
>>
>>
>> On 10/30/2019 5:12 AM, Christofer Dutz wrote:
>>> Hi Xiangdong,
>>>
>>> That's the easy way ... I would strongly suggest not to do it.
>>> I 100% agree with Justin, not to use an external tracking service.
>>>
>>> Chris
>>>
>>>
>>>
>>> Am 30.10.19, 09:51 schrieb "Xiangdong Huang" :
>>>
>>> Hi,
>>>
>>> According to the jira discussion, sounds Apache Infra can support
>> the data:
>>> > If you need insight, please work with the infrastructure team - we
>> can
>>> provide you aggregated views of data.
>>>
>>> I think I can have a try to look what the data looks like from Infra.
>>>
>>>  > So far nothing official has been decided, but my reading of
>> various list
>>> (and other JIRA issues) is that projects will at some point in the
>> near
>>> future be asked not to use it.
>>> > The PPMC can of course decide to use it until that point.
>>>
>>> From my opinion, I'd like to enable the track using Google Analytics
>> for a
>>> while, as now iotdb is in a emerging stage and it may help us to
>>> improvement the website better.
>>>
>>>  Best,
>>> ---
>>> Xiangdong Huang
>>> School of Software, Tsinghua University
>>>
>>>  黄向东
>>> 清华大学 软件学院
>>>
>>>
>>> Justin Mclean  于2019年10月30日周三 下午4:27写道:
>>>
>>> > Hi,
>>> >
>>> > See https://issues.apache.org/jira/browse/LEGAL-470
>>> >
>>> > Note: "Yes, please avoid using Google Analytics or any other
>> third-party
>>> > analytics solution that ships our users' data elsewhere. If you
>> need
>>> > insight, please work with the infrastructure team - we can provide
>> you
>>> > aggregated views of data.”
>>> >
>>> > So far nothing official has been decided, but my reading of
>> various list
>>> > (and other JIRA issues) is that projects will at some point in the
>> near
>>> > future be asked not to use it.
>>> >
>>> > The PPMC can of course decide to use it until that point.
>>> >
>>> > Thanks,
>>> > Justin
>>> >
>>> >
>>> >
>>>
>>>
>> --
>> Kevin A. McGrail
>> kmcgr...@apache.org
>>
>> Member, Apache Software Foundation
>> Chair Emeritus Apache SpamAssassin Project
>> https://www.linkedin.com/in/kmcgrail - 703.798.0171
>>
>>
-- 
Kevin A. McGrail
kmcgr...@apache.org

Member, Apache Software Foundation
Chair Emeritus Apache SpamAssassin Project
https://www.linkedin.com/in/kmcgrail - 703.798.0171



Re: Does anyone know how to monitor the access traffic of our website

2019-10-30 Thread Justin Mclean
Hi Kevin,
I don't disagree. I would assume if projects are asked to stop using it the
Apache site would also no longer use it.
Perhaps it's time to ask legal / board again on progress with this?
Thanks,
Justin

On Wed, 30 Oct 2019, 22:58 Kevin A. McGrail,  wrote:

> Sounds like some people who think it's good for the goose but not the
> gander as the idiom goes.  If www.apache.org uses it, a project should
> be ok to use it.  People are very used to using analytics and these are
> large companies.  They are best setup to handle privacy issues like
> GDPR.  A solution we roll ourselves is not.
>
>
> On 10/30/2019 5:12 AM, Christofer Dutz wrote:
> > Hi Xiangdong,
> >
> > That's the easy way ... I would strongly suggest not to do it.
> > I 100% agree with Justin, not to use an external tracking service.
> >
> > Chris
> >
> >
> >
> > Am 30.10.19, 09:51 schrieb "Xiangdong Huang" :
> >
> > Hi,
> >
> > According to the jira discussion, sounds Apache Infra can support
> the data:
> > > If you need insight, please work with the infrastructure team - we
> can
> > provide you aggregated views of data.
> >
> > I think I can have a try to look what the data looks like from Infra.
> >
> >  > So far nothing official has been decided, but my reading of
> various list
> > (and other JIRA issues) is that projects will at some point in the
> near
> > future be asked not to use it.
> > > The PPMC can of course decide to use it until that point.
> >
> > From my opinion, I'd like to enable the track using Google Analytics
> for a
> > while, as now iotdb is in a emerging stage and it may help us to
> > improvement the website better.
> >
> >  Best,
> > ---
> > Xiangdong Huang
> > School of Software, Tsinghua University
> >
> >  黄向东
> > 清华大学 软件学院
> >
> >
> > Justin Mclean  于2019年10月30日周三 下午4:27写道:
> >
> > > Hi,
> > >
> > > See https://issues.apache.org/jira/browse/LEGAL-470
> > >
> > > Note: "Yes, please avoid using Google Analytics or any other
> third-party
> > > analytics solution that ships our users' data elsewhere. If you
> need
> > > insight, please work with the infrastructure team - we can provide
> you
> > > aggregated views of data.”
> > >
> > > So far nothing official has been decided, but my reading of
> various list
> > > (and other JIRA issues) is that projects will at some point in the
> near
> > > future be asked not to use it.
> > >
> > > The PPMC can of course decide to use it until that point.
> > >
> > > Thanks,
> > > Justin
> > >
> > >
> > >
> >
> >
> --
> Kevin A. McGrail
> kmcgr...@apache.org
>
> Member, Apache Software Foundation
> Chair Emeritus Apache SpamAssassin Project
> https://www.linkedin.com/in/kmcgrail - 703.798.0171
>
>


Re: Does anyone know how to monitor the access traffic of our website

2019-10-30 Thread Kevin A. McGrail
Sounds like some people who think it's good for the goose but not the
gander as the idiom goes.  If www.apache.org uses it, a project should
be ok to use it.  People are very used to using analytics and these are
large companies.  They are best setup to handle privacy issues like
GDPR.  A solution we roll ourselves is not.


On 10/30/2019 5:12 AM, Christofer Dutz wrote:
> Hi Xiangdong,
>
> That's the easy way ... I would strongly suggest not to do it.
> I 100% agree with Justin, not to use an external tracking service. 
>
> Chris
>
>
>
> Am 30.10.19, 09:51 schrieb "Xiangdong Huang" :
>
> Hi,
> 
> According to the jira discussion, sounds Apache Infra can support the 
> data:
> > If you need insight, please work with the infrastructure team - we can
> provide you aggregated views of data.
> 
> I think I can have a try to look what the data looks like from Infra.
> 
>  > So far nothing official has been decided, but my reading of various 
> list
> (and other JIRA issues) is that projects will at some point in the near
> future be asked not to use it.
> > The PPMC can of course decide to use it until that point.
> 
> From my opinion, I'd like to enable the track using Google Analytics for a
> while, as now iotdb is in a emerging stage and it may help us to
> improvement the website better.
> 
>  Best,
> ---
> Xiangdong Huang
> School of Software, Tsinghua University
> 
>  黄向东
> 清华大学 软件学院
> 
> 
> Justin Mclean  于2019年10月30日周三 下午4:27写道:
> 
> > Hi,
> >
> > See https://issues.apache.org/jira/browse/LEGAL-470
> >
> > Note: "Yes, please avoid using Google Analytics or any other third-party
> > analytics solution that ships our users' data elsewhere. If you need
> > insight, please work with the infrastructure team - we can provide you
> > aggregated views of data.”
> >
> > So far nothing official has been decided, but my reading of various list
> > (and other JIRA issues) is that projects will at some point in the near
> > future be asked not to use it.
> >
> > The PPMC can of course decide to use it until that point.
> >
> > Thanks,
> > Justin
> >
> >
> >
> 
>
-- 
Kevin A. McGrail
kmcgr...@apache.org

Member, Apache Software Foundation
Chair Emeritus Apache SpamAssassin Project
https://www.linkedin.com/in/kmcgrail - 703.798.0171



[jira] [Created] (IOTDB-284) Do not write empty page

2019-10-30 Thread Jialin Qiao (Jira)
Jialin Qiao created IOTDB-284:
-

 Summary: Do not write empty page
 Key: IOTDB-284
 URL: https://issues.apache.org/jira/browse/IOTDB-284
 Project: Apache IoTDB
  Issue Type: Improvement
Reporter: Jialin Qiao


SET STORAGE GROUP TO root.turbine;

CREATE TIMESERIES root.turbine.d2.s0 WITH DATATYPE=INT32, ENCODING=RLE;

insert into root.turbine.d2(timestamp,s0) values(2,25.3);

flush 

 

When receiving the insert statement, IoTDB will create a ChunkWriterImpl, but 
the value will not be inserted because of type error.

The empty ChunkWriterImpl will be written when meeting the 'flush' command. 
Then, the server will log an error:

 

19:33:24.667 [pool-6-IoTDB-Flush-SubTask-ServerServiceImpl-thread-2] ERROR 
org.apache.iotdb.tsfile.write.chunk.ChunkBuffer - Write page error, 
[s0,INT32,RLE,{},UNCOMPRESSED], minTime:-9223372036854775808, maxTime:0

 

We can check the data type before creating the ChunkWriterImpl to avoid write 
empty ChunkWriterImpl.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (IOTDB-278) remove tsfile-format.properties

2019-10-30 Thread Jialin Qiao (Jira)


 [ 
https://issues.apache.org/jira/browse/IOTDB-278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jialin Qiao closed IOTDB-278.
-
Fix Version/s: 0.9.0
   Resolution: Fixed

> remove tsfile-format.properties
> ---
>
> Key: IOTDB-278
> URL: https://issues.apache.org/jira/browse/IOTDB-278
> Project: Apache IoTDB
>  Issue Type: Improvement
>Reporter: Jialin Qiao
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> # merge tsfile-format.properties to iotdb-engine.properties
>  # add configuration class for tsfile, let users set configurations through 
> API
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [DISCUSS][FEATURE] Rich Datatypes API

2019-10-30 Thread Julian Feinauer
Hi,

in fact it is mostly in the MDF spec not for compression (that’s a nice side 
effect) but rather for being able to really express the (physical) content of a 
signal.
So my initial idea was to implement it as an optional layer on top of the 
current tsfile which does the "interpretation". Because in the tsfile its 
always just a "primitive" series that is stored.

So the idea would be to store some metadata (like a formula, lookup table, ...) 
on creation and use that on reading but only optionally.
You can look at how avro handles non primitive types (they call it 
LogicalTypes) here: https://avro.apache.org/docs/1.8.1/spec.html#Logical+Types
This is similar to my idea.

Julian

Am 29.10.19, 14:40 schrieb "Xiangdong Huang" :

Hi,

> Then its most efficient to store integers and a formula like a * x + b
with e.g. b = 3 and a = 1/100.
> So 3V would be stored as x = 0, 3.01V -> x = 1, ... 4.2V as x = 1200.
> So we only store 0 to 1200 and no decimals and stuff which would be very
easily compressable I thnk.

Good idea! Two thumbs up for that.

But for cases like the above, implementing a new encoding method is better
than a new data type.

e.g, create time series root.a.b.voltage with encoding =
linear_transformation and encoding_parameter = "describe the function like
y=a * x + b" and datatype = INT.

"linear_transformation" is the new encoding method.

Now I get two cases from the discussion, one is like Optional data, and the
other is data that can be transformative.
So, do we want to support the above two, or find a more general data type
for "rich data type" (can the MDF file support some inspiration)?

Best,
---
Xiangdong Huang
School of Software, Tsinghua University

 黄向东
清华大学 软件学院


Julian Feinauer  于2019年10月29日周二 下午8:26写道:

> Hi Xiangdong,
>
> to your second question:
> The use case ist he other way round.
> We know that we measure e.g. a voltage between 3V and 4.2V with a
> precision of 0.01 or something.
> Then its most efficient to store integers and a formula like a * x + b
> with e.g. b = 3 and a = 1/100.
> So 3V would be stored as x = 0, 3.01V -> x = 1, ... 4.2V as x = 1200.
> So we only store 0 to 1200 and no decimals and stuff which would be very
> easily compressable I thnk.
>
> Julian
>
> Am 29.10.19, 07:13 schrieb "Xiangdong Huang" :
>
> Hi,
>
> > In Java we could model it as a variable Optional<> x which could be
> null,
> Optional.empty(), Optional.of(true), Optional.of(false).
>
> It make sense.  And, using a new data type to achieve in IoTDB it is
> ok.
>
> > Or scale formulas like a*x+b which allows to leverage the precision
> even
> for “small” double values or even integers.
>
> So, are you considering a use case like: the time series value should
> be
> [1, 1, 0, 0, 1, 1, 1, 0, 0...]  but actually we get [0.99, 0.99, 0.01,
> 0,
> 1, 1, 0.999, 0, 0.01] (because of the precision of sensors)?
> And, what values do you want to save?
> (1)save them as 1 and 0.  Or,
> (2)  save them as 0.99, 0.01 indeed, but using a specific query API to
> return data like 1 and 0?
>
> My another question is, is there a general data type can support the
> above
> cases?
>
> Best,
> ---
> Xiangdong Huang
> School of Software, Tsinghua University
>
>  黄向东
> 清华大学 软件学院
>
>
> Julian Feinauer  于2019年10月29日周二
> 上午3:58写道:
>
> > Hi all,
> >
> > I wanted to discuss a possible new feature I will call Rich 
Datatypes
> > (RDT) API in the following.
> > I worked a lot in the automotive industry and there is a broadly
> adopted
> > open Standard called ASAM MDF (
> https://www.asam.net/standards/detail/mdf/
> > ).
> > It is a format which is targeted at the efficient storage but at the
> same
> > time it supports VERY complex types (which are often used in
> automotive
> > controllers).
> >
> > Take something as simple as a boolean. We could store it as a
> boolean (as
> > java bool) in 1 bit.
> > BUT we have overall 4 possibilities:
> >
> >   *   No value is available for a timestamp (NULL / nothing stored)
> >   *   We had a successful request but the Controller does not know
> whether
> > true or false (or had an internal error), this is a bit like
> > Optional.isPresent() == false
> >   *   True
> >   *   False
> > In Java we could model it as a variable Optional<> x which could be
> null,
> > Opt

Re: Does anyone know how to monitor the access traffic of our website

2019-10-30 Thread Christofer Dutz
Hi Xiangdong,

That's the easy way ... I would strongly suggest not to do it.
I 100% agree with Justin, not to use an external tracking service. 

Chris



Am 30.10.19, 09:51 schrieb "Xiangdong Huang" :

Hi,

According to the jira discussion, sounds Apache Infra can support the data:
> If you need insight, please work with the infrastructure team - we can
provide you aggregated views of data.

I think I can have a try to look what the data looks like from Infra.

 > So far nothing official has been decided, but my reading of various list
(and other JIRA issues) is that projects will at some point in the near
future be asked not to use it.
> The PPMC can of course decide to use it until that point.

From my opinion, I'd like to enable the track using Google Analytics for a
while, as now iotdb is in a emerging stage and it may help us to
improvement the website better.

 Best,
---
Xiangdong Huang
School of Software, Tsinghua University

 黄向东
清华大学 软件学院


Justin Mclean  于2019年10月30日周三 下午4:27写道:

> Hi,
>
> See https://issues.apache.org/jira/browse/LEGAL-470
>
> Note: "Yes, please avoid using Google Analytics or any other third-party
> analytics solution that ships our users' data elsewhere. If you need
> insight, please work with the infrastructure team - we can provide you
> aggregated views of data.”
>
> So far nothing official has been decided, but my reading of various list
> (and other JIRA issues) is that projects will at some point in the near
> future be asked not to use it.
>
> The PPMC can of course decide to use it until that point.
>
> Thanks,
> Justin
>
>
>




[jira] [Created] (IOTDB-283) Modify rules of datatype inference when creating schema automatically is enabled

2019-10-30 Thread Yanzhe An (Jira)
Yanzhe An created IOTDB-283:
---

 Summary: Modify rules of datatype inference when creating schema 
automatically is enabled
 Key: IOTDB-283
 URL: https://issues.apache.org/jira/browse/IOTDB-283
 Project: Apache IoTDB
  Issue Type: Bug
Reporter: Yanzhe An


Rules of datatype inference are unsuitable for batch insertions when creating 
schema automatically is enabled, because _BatchInsertPlan_ has a member 
variable named _dataTypes_ which indicates datatypes of each row.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [jira] [Created] (IOTDB-279) Merge TsDigest into Statistics

2019-10-30 Thread Xiangdong Huang
+1.

---
Xiangdong Huang
School of Software, Tsinghua University

 黄向东
清华大学 软件学院


Jialin Qiao (Jira)  于2019年10月30日周三 上午11:25写道:

> Jialin Qiao created IOTDB-279:
> -
>
>  Summary: Merge TsDigest into Statistics
>  Key: IOTDB-279
>  URL: https://issues.apache.org/jira/browse/IOTDB-279
>  Project: Apache IoTDB
>   Issue Type: Improvement
> Reporter: Jialin Qiao
>
>
> As I observe, TsDigest is only the ByteBuffer format of Statistics, so why
> not merge them, which could make the code clear.
>
>
>
> --
> This message was sent by Atlassian Jira
> (v8.3.4#803005)
>


Re: [jira] [Created] (IOTDB-281) Add powered by on the website

2019-10-30 Thread Xiangdong Huang
Do we add a vote for that?

---
Xiangdong Huang
School of Software, Tsinghua University

 黄向东
清华大学 软件学院


Jialin Qiao (Jira)  于2019年10月30日周三 上午11:31写道:

> Jialin Qiao created IOTDB-281:
> -
>
>  Summary: Add powered by on the website
>  Key: IOTDB-281
>  URL: https://issues.apache.org/jira/browse/IOTDB-281
>  Project: Apache IoTDB
>   Issue Type: Improvement
> Reporter: Jialin Qiao
>
>
> We could add the logo of companies powered by IoTDB on the website.
>
>
>
> --
> This message was sent by Atlassian Jira
> (v8.3.4#803005)
>


Re: [jira] [Created] (IOTDB-282) require "show version" query

2019-10-30 Thread Xiangdong Huang
+1 for the requirement.

---
Xiangdong Huang
School of Software, Tsinghua University

 黄向东
清华大学 软件学院


Yingbo (Jira)  于2019年10月30日周三 下午4:57写道:

> Yingbo created IOTDB-282:
> 
>
>  Summary: require "show version" query
>  Key: IOTDB-282
>  URL: https://issues.apache.org/jira/browse/IOTDB-282
>  Project: Apache IoTDB
>   Issue Type: New Feature
> Reporter: Yingbo
>  Fix For: 0.8.0
>
>
> in order to support different iotdb versions in different production
> environment. the data access program needs a "show version" or "show
> parameters" query. so as to use compatible query syntax accordingly.
>
>
>
> --
> This message was sent by Atlassian Jira
> (v8.3.4#803005)
>


[jira] [Created] (IOTDB-282) require "show version" query

2019-10-30 Thread Yingbo (Jira)
Yingbo created IOTDB-282:


 Summary: require "show version" query
 Key: IOTDB-282
 URL: https://issues.apache.org/jira/browse/IOTDB-282
 Project: Apache IoTDB
  Issue Type: New Feature
Reporter: Yingbo
 Fix For: 0.8.0


in order to support different iotdb versions in different production 
environment. the data access program needs a "show version" or "show 
parameters" query. so as to use compatible query syntax accordingly. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: Does anyone know how to monitor the access traffic of our website

2019-10-30 Thread Xiangdong Huang
Hi,

According to the jira discussion, sounds Apache Infra can support the data:
> If you need insight, please work with the infrastructure team - we can
provide you aggregated views of data.

I think I can have a try to look what the data looks like from Infra.

 > So far nothing official has been decided, but my reading of various list
(and other JIRA issues) is that projects will at some point in the near
future be asked not to use it.
> The PPMC can of course decide to use it until that point.

>From my opinion, I'd like to enable the track using Google Analytics for a
while, as now iotdb is in a emerging stage and it may help us to
improvement the website better.

 Best,
---
Xiangdong Huang
School of Software, Tsinghua University

 黄向东
清华大学 软件学院


Justin Mclean  于2019年10月30日周三 下午4:27写道:

> Hi,
>
> See https://issues.apache.org/jira/browse/LEGAL-470
>
> Note: "Yes, please avoid using Google Analytics or any other third-party
> analytics solution that ships our users' data elsewhere. If you need
> insight, please work with the infrastructure team - we can provide you
> aggregated views of data.”
>
> So far nothing official has been decided, but my reading of various list
> (and other JIRA issues) is that projects will at some point in the near
> future be asked not to use it.
>
> The PPMC can of course decide to use it until that point.
>
> Thanks,
> Justin
>
>
>


Re: Does anyone know how to monitor the access traffic of our website

2019-10-30 Thread Justin Mclean
Hi,

See https://issues.apache.org/jira/browse/LEGAL-470

Note: "Yes, please avoid using Google Analytics or any other third-party 
analytics solution that ships our users' data elsewhere. If you need insight, 
please work with the infrastructure team - we can provide you aggregated views 
of data.”

So far nothing official has been decided, but my reading of various list (and 
other JIRA issues) is that projects will at some point in the near future be 
asked not to use it.

The PPMC can of course decide to use it until that point.

Thanks,
Justin




Re: Does anyone know how to monitor the access traffic of our website

2019-10-30 Thread Xiangdong Huang
Hi,

That's good.

I think we can also add the privacy-policy page on the website and then add
Google analytics and Baidu analytics (for China visitors).

So, shall we begin a vote?

Best,
---
Xiangdong Huang
School of Software, Tsinghua University

 黄向东
清华大学 软件学院


Kevin A. McGrail  于2019年10月30日周三 下午3:44写道:

> I am sorry but I disagree with that statement on several points.  The
> ASF officially uses Google Analytics (see
>
> https://www.apache.org/foundation/policies/privacy.html#website-usage-privacy-policy
> )
> and I see zero reason to discourage a project from using these services.
>
> From my perspective, based on a number of projects with this same issue,
> the key things are:
>
> the PMC should discuss it and vote.  It's a project decision.
>
> there should be a clear policy about what is tracked and why.  echarts
> if I remember spent some time on this if you need an example.
>
> the ability to access the tracking account should be shared with the PMC
>
> Otherwise, in aggregate, this can be very valuable information and there
> is little reason I find not to collect it.
>
> Regards,
>
> KAM
>
> On 10/26/2019 7:28 AM, Justin Mclean wrote:
> > Hi,
> > There been some discussion on other ASF lists on privacy and the use of
> > services like google analytics. While nothing official has been decided
> the
> > discussion has tended to suggest to not use these sort of services.
> > Thanks,
> > Justin
> >
> > On Sat, 26 Oct 2019, 10:05 Julian Feinauer, <
> j.feina...@pragmaticminds.de>
> > wrote:
> >
> >> Hey,
> >>
> >> just go tot he flink dev list and ask there __
> >>
> >> Julian
> >>
> >> Am 26.10.19, 09:00 schrieb "Xiangdong Huang" :
> >>
> >> Hi,
> >>
> >> On ACEU19, I noticed that the Flink community showed a figure to
> >> illustrate
> >> the daily traffic of the official website.
> >>
> >> The community find some interesting things like the traffic is very
> >> low on
> >> the first week of October (China's National day) and Chinese New
> >> Year...
> >>
> >> The above discovery is not important, but interested me is that they
> >> can
> >> count the traffic of the website...
> >>
> >> So, does Apache have some method to do that? Or we may need to add
> some
> >> JavaScripts like Google Analytics, Baidu analytics.
> >>
> >> Best,
> >> ---
> >> Xiangdong Huang
> >> School of Software, Tsinghua University
> >>
> >>  黄向东
> >> 清华大学 软件学院
> >>
> >>
> >>
> --
> Kevin A. McGrail
> kmcgr...@apache.org
>
> Member, Apache Software Foundation
> Chair Emeritus Apache SpamAssassin Project
> https://www.linkedin.com/in/kmcgrail - 703.798.0171
>
>


Re: Does anyone know how to monitor the access traffic of our website

2019-10-30 Thread Kevin A. McGrail
I am sorry but I disagree with that statement on several points.  The
ASF officially uses Google Analytics (see
https://www.apache.org/foundation/policies/privacy.html#website-usage-privacy-policy)
and I see zero reason to discourage a project from using these services.

>From my perspective, based on a number of projects with this same issue,
the key things are:

the PMC should discuss it and vote.  It's a project decision.

there should be a clear policy about what is tracked and why.  echarts
if I remember spent some time on this if you need an example.

the ability to access the tracking account should be shared with the PMC

Otherwise, in aggregate, this can be very valuable information and there
is little reason I find not to collect it.

Regards,

KAM

On 10/26/2019 7:28 AM, Justin Mclean wrote:
> Hi,
> There been some discussion on other ASF lists on privacy and the use of
> services like google analytics. While nothing official has been decided the
> discussion has tended to suggest to not use these sort of services.
> Thanks,
> Justin
>
> On Sat, 26 Oct 2019, 10:05 Julian Feinauer, 
> wrote:
>
>> Hey,
>>
>> just go tot he flink dev list and ask there __
>>
>> Julian
>>
>> Am 26.10.19, 09:00 schrieb "Xiangdong Huang" :
>>
>> Hi,
>>
>> On ACEU19, I noticed that the Flink community showed a figure to
>> illustrate
>> the daily traffic of the official website.
>>
>> The community find some interesting things like the traffic is very
>> low on
>> the first week of October (China's National day) and Chinese New
>> Year...
>>
>> The above discovery is not important, but interested me is that they
>> can
>> count the traffic of the website...
>>
>> So, does Apache have some method to do that? Or we may need to add some
>> JavaScripts like Google Analytics, Baidu analytics.
>>
>> Best,
>> ---
>> Xiangdong Huang
>> School of Software, Tsinghua University
>>
>>  黄向东
>> 清华大学 软件学院
>>
>>
>>
-- 
Kevin A. McGrail
kmcgr...@apache.org

Member, Apache Software Foundation
Chair Emeritus Apache SpamAssassin Project
https://www.linkedin.com/in/kmcgrail - 703.798.0171