Re: rewrote jdbc class for possible extensions and need assistance for fixing column naming issue

Philipp Zehnder Wed, 17 Jun 2020 23:47:03 -0700

Hi Florian,

ok, for now let's just use double. When we run into problems regarding the 
precision we can change it later.


Regarding the timestamp. Currently we use the domain property 
"https://schema.org/DateTime <https://schema.org/DateTime>” to mark a property 
as a timestamp in StreamPipes.
(e.g. {’time’: 1592462654853}
We have to convert those into the SQL DateTimes. I did this already for the 
mysql sink with “java.sql.Timestamp” [1] line 142
I think this date time format is standardized, so we do not have to choose our 
own. 
The question regarding the time zone is very good because so far we do net 
represent it in the event. Do you have any idea how we could do that?


Philipp

[1] 
https://github.com/apache/incubator-streampipes-extensions/blob/dev/streampipes-sinks-databases-jvm/src/main/java/org/apache/streampipes/sinks/databases/jvm/mysql/Mysql.java
 
<https://github.com/apache/incubator-streampipes-extensions/blob/dev/streampipes-sinks-databases-jvm/src/main/java/org/apache/streampipes/sinks/databases/jvm/mysql/Mysql.java>

> On 17. Jun 2020, at 21:27, Florian Micklich <[email protected]> wrote:
> 
> Hi Philipp,
> this is really hard to say. Is there another way to represent a number-
> type in Java instead of double?
> I think in most cases the numerical values are within the range of 15
> decimal places.  All other cases are exotic and cannot easily be
> handled by other programs as well.
> 
> But if this limit is exceeded, the question is really what should
> happen. Ultimately there are two options. Loss of information or drag
> along as a string. Maybe there is the possibility to read out the
> precision before importing and then convert it into double. However, I
> am not yet aware of any cases in which more than 15 decimal places have
> been used.
> 
> About timestamps. This is also a difficult decision.I think we should
> not use the domain timestamp together with the UNIX
> timestamp.Furthermore if we use the term date /time we should use the
> term timestamp and also the domain.
> Is there a domain that represent UNIX timestamp? This way we can
> recognize the type and know when to transform the values in the sink
> process and have also the option for the user to use either UNIX
> timestamp or timestamp during import.
> 
> The "real" timestamp as date / time is just a string with a specific
> pattern. We "only" have to decide what pattern we recognize as
> timestamp.The main problem it, there are a lot of different ways to
> represent timestamp  and a lot of other problems [1].  We should start
> with the ISO 8601 [2].
> The other question is: what is the result  if we cast UNIX timestamp to
> timestamp. The definition uses the UTC as basic time zone.Or is this
> transferred automatically to the local time zone? [3]
> So in the end we have to support timestamps together with time zones I
> think.But I think this is the last step.
> 
> GreetingsFlorian
> [1] 
> https://codeblog.jonskeet.uk/2015/05/05/common-mistakes-in-datetime-formatting-and-parsing/[2]
> https://en.wikipedia.org/wiki/ISO_8601[3] 
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=158862916#unix-timestamp
> 
> 
> 
> 
> Am Mittwoch, den 17.06.2020, 07:21 +0000 schrieb Philipp Zehnder:
>> Hi Florian,
>> the information in the link looks good, as well as the type
>> validation in your method.
>> 
>> Do you think we will get any problems because we have less data types
>> in StreamPipes (e.g. we just support double)?
>> So we have to change the types and might lose information.
>> 
>> And how about timestamps? In StreamPipes we use UNIX timestamps in
>> milliseconds. In JDBC we have to convert them to dates. My suggestion
>> would be to use date time, as a default type for timestamps. What do
>> you think?
>> 
>> Philipp
>> 
>>> On 17. Jun 2020, at 08:22, Florian Micklich <[email protected]>
>>> wrote:
>>> 
>>> Hi Philipp,
>>> 
>>> I think we should handel the mapping via following standard values,
>>> returned by JDBC meta information [1].
>>> This way an easy validation between column types 
>>> in the db and event properties can be done. Also this should be
>>> valid
>>> for all DB's using JDBC.
>>> 
>>> If it is required to map e.g. DOUBLE to PG_DOUBLE_PRECISION, this
>>> should be done based on referencing to the above list.
>>> 
>>> What do you think?
>>> This way it is also easy for Adapters get the correct resource
>>> type.
>>> 
>>> Here also a possible solution for the validation using this
>>> approach:
>>> 
>>> 
>>>  protected void validateTable() throws SpRuntimeException {
>>>    //TODO: Add validation of an existing table
>>>    HashMap<String, Integer> columnsMap = new HashMap<>();
>>> 
>>>    try {
>>>      ResultSet column = c.getMetaData().getColumns(null, null,
>>> tableName, null);
>>>      while (column.next()) {
>>>        String columnName = column.getString("COLUMN_NAME");
>>>        Integer datatype =
>>> Integer.parseInt(column.getString("DATA_TYPE"));
>>>        columnsMap.put(columnName, datatype);
>>>      }
>>>    } catch (SQLException throwables) {
>>>      throwables.printStackTrace();
>>>    }
>>>    //todo
>>>    // create hashmap from event properties with like
>>> HashMap<String, Integer> eventMap = new HashMap<>();
>>>    HashMap<String, Integer> eventMap = new HashMap<>();
>>> 
>>>    //eventProperties;
>>>    if (!columnsMap.equals(eventMap)) {
>>>      throw new SpRuntimeException("Table '" + tableName + "' does
>>> not match the eventproperties");
>>>    }
>>>  }
>>> 
>>> 
>>> Greetings
>>> Florian 
>>> 
>>> [1] 
>>> https://www.tutorialspoint.com/java-resultsetmetadata-getcolumntype-method-with-example
>>> 
>>> Am Dienstag, den 16.06.2020, 23:21 +0200 schrieb Philipp Zehnder:
>>>> Hi Florian,
>>>> 
>>>> great work with the wiki [1]. Indeed, I think this is a very
>>>> important point and we should clarify this, because it can get
>>>> quite
>>>> confusing with all the different possibilities.
>>>> Your description helps a lot! Currently, we do not support all
>>>> data
>>>> types.
>>>> I think it makes sense to define the mappings between StreamPipes
>>>> types and data base types for all available values, both for the
>>>> sink
>>>> and for adapters.
>>>> 
>>>> Regarding the capital letters in the names, is this just a
>>>> problem
>>>> with Postgres or might there also be other JDBC databases with
>>>> the
>>>> same issue?
>>>> And what about special characters, do we have to filter them out
>>>> too?
>>>> 
>>>> I also like the idea with the inheritance of the JdbcClient. 
>>>> What do you think, should we move all sinks using JDBC into one
>>>> package?
>>>> 
>>>> Your changes look good, I would say it makes sense to merge them
>>>> into
>>>> dev and try to use the components as much as possible to find
>>>> possible problems.
>>>> A would also suggest to implement an adapter for each of the
>>>> sinks.
>>>> It makes the testing and usage of the databases much easier, when
>>>> we
>>>> can use StreamPipes to write data into the database and then use
>>>> an
>>>> adapter to read the data again. 
>>>> 
>>>> Philipp
>>>> 
>>>> 
>>>> [1] 
>>>> https://cwiki.apache.org/confluence/display/STREAMPIPES/JDBC+Database+Client
>>>> <
>>>> https://cwiki.apache.org/confluence/display/STREAMPIPES/JDBC+Database+Client
>>>>> On 12. Jun 2020, at 13:54, Florian Micklich <
>>>>> [email protected]>
>>>>> wrote:
>>>>> 
>>>>> Hi all,
>>>>> 
>>>>> I created a ticket STREAMPIPES-162 and also found a solution to
>>>>> fix
>>>>> the
>>>>> described issue.
>>>>> 
>>>>> I tested it with the Postgres and MYSQL Sink.
>>>>> If I get a short OK I would push the rewrite_jdbcclient branch
>>>>> into
>>>>> dev.
>>>>> 
>>>>> Greetings
>>>>> Florian
>>>>> 
>>>>> 
>>>>> 
>>>>> Am Montag, den 08.06.2020, 12:57 +0200 schrieb Florian
>>>>> Micklich:
>>>>>> Hi all,
>>>>>> 
>>>>>> I just finished rewriting the JDBC class, and added an
>>>>>> "extends"
>>>>>> postgresJDBC-client, to handle postgres and postgis as a sink
>>>>>> with
>>>>>> more
>>>>>> options.
>>>>>> 
>>>>>> I tested it with the mysql sink already and everything is
>>>>>> working
>>>>>> with
>>>>>> mother JDBCClient-class.
>>>>>> 
>>>>>> During this process I also "outsourced" the SQLAttributeEnum
>>>>>> out
>>>>>> of
>>>>>> the
>>>>>> JDBC-class and changed some names and layout for better
>>>>>> understanding:
>>>>>> 
>>>>>> 
>>>>>>  public enum SqlAttribute {
>>>>>>          // DEFAULT
>>>>>>          INTEGER("INT"),
>>>>>>          LONG("BIGINT"),
>>>>>>          FLOAT("FLOAT"),
>>>>>>          DOUBLE("DOUBLE"),
>>>>>>          STRING("VARCHAR(255)"),
>>>>>>          BOOLEAN("BOOLEAN"),
>>>>>>          //MYSQL
>>>>>>          MYSQL_DATETIME("DATETIME"),
>>>>>>          //POSTGRES / POSTGIS
>>>>>>          PG_DOUBLE("NUMERIC");
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> I hope this is ok the way I did it.
>>>>>> 
>>>>>> 
>>>>>> In this case postgres uses the TYPE numeric instead of the
>>>>>> non
>>>>>> existing
>>>>>> DOUBLE type, as an example.
>>>>>> So later on I will add also a postgres timestamp type, like
>>>>>> it
>>>>>> was
>>>>>> done
>>>>>> in the mysql sink last week.
>>>>>> 
>>>>>> But there is one think, I need some assistance and it took a
>>>>>> while to
>>>>>> find out why and what is happening:
>>>>>> 
>>>>>> So I used the randomDataset Adapter and the Postgres Sink.
>>>>>> 
>>>>>> The database, schema and table is created as expected but
>>>>>> during
>>>>>> save,
>>>>>> I get all the time:
>>>>>> 
>>>>>> Table 'testtable' was unexpectedly not found and gets
>>>>>> recreated.
>>>>>> ERROR: column "randomText" of relation "testtable" does not
>>>>>> exist
>>>>>> 
>>>>>> So the first message is a litte bit misleading, because the
>>>>>> table
>>>>>> exists but can't be found and will be recreated in an endless
>>>>>> loop.
>>>>>> 
>>>>>> The reason is because this statement during the
>>>>>> generatePreparedStatement method:
>>>>>> 
>>>>>>  INSERT INTO testschema.testtable ( "randomText",
>>>>>> "randomNumber", "count", "timestamp" ) VALUES ( 'c', 69,
>>>>>> 83531,
>>>>>> 1591611708415 );
>>>>>> 
>>>>>> 
>>>>>> The problem here is that postgres uses lowercases for the
>>>>>> column
>>>>>> names.
>>>>>> So instead of randomText then name has to be randomtext:
>>>>>> 
>>>>>>  INSERT INTO testschema.testtable ( "randomtext",
>>>>>> "randomnumber", "count", "timestamp" ) VALUES ( 'c', 69,
>>>>>> 83531,
>>>>>> 1591611708415 )
>>>>>> 
>>>>>> Can someone give me a hint, where to add .toLowerCase to the
>>>>>> correct
>>>>>> positon to get a quick fix?
>>>>>> 
>>>>>> I am working in the branch feature/jdbc_rewrite and the
>>>>>> PostgresJDPCClient class.
>>>>>> 
>>>>>> Greetings Florian
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>> 
>> .........................................................
>> M. Sc. Philipp Zehnder
>> Wissenschaftlicher Mitarbeiter | Research Scientist
>> Information Process Engineering (IPE)
>> 
>> FZI Forschungszentrum Informatik
>> Haid-und-Neu-Str. 10–14 
>> 76131 Karlsruhe, Germany
>> Tel.: +49 721 9654-805
>> Fax: +49 721 9654-806
>> 
>> [email protected]
>> https://www.fzi.de/mitarbeiter/philipp-zehnder
>> 
>> .........................................................
>> FZI Forschungszentrum Informatik
>> Stiftung des bürgerlichen Rechts
>> Stiftung Az: 14-0563.1 Regierungspräsidium Karlsruhe
>> Vorstand: Prof. Dr. Andreas Oberweis, Jan Wiesenberger, Prof. Dr.-
>> Ing. J. Marius Zöllner
>> Vorsitzender des Kuratoriums: Ministerialdirigent Günther Leßnerkraus
>> .........................................................
>> 
>> 
> 
>

Re: rewrote jdbc class for possible extensions and need assistance for fixing column naming issue

Reply via email to