Hi, John.

Sorry for the delay, I am changing work and I have been very busy :( I will
try to answer your questions :)

*> In the Employee example there is a field called 'dateOfBirth'. I tried
to map that field with the UNIXTIME_MICROS datatype of Kudu (I intuitively
assumed this is a date.). However, in the java world the Employee field is
a Long value and the kudu datatype is a Timestamp. So, I was wondering
whether I should force the usage of the UNIXTIME_MICROS datatype for this
field or just use a LONG datatype in Kudu.*

In Avro 1.8 were introduced "Logical Types" so there is a "date" type with
an underlying "int" [1]. It's the first time I read about because until the
last version upgrade of Avro this weren't there. I would suggest to ignore
"dates" and map dateOfBirth as long, since in any case -in avro- the value
is the unix epoch. After this first approach, a design improvement would be
great, though :)

- Would be good to have in the mapping a "timestamp" type so KuduStore
converts between the Entity long field <-> Kudu timestamp storage?
- Is there any other approach?


*> What is the Gora's policy regarding flush()? *
*> KuduClient has multiple flushing modes
<https://kudu.apache.org/apidocs/org/apache/kudu/client/SessionConfiguration.FlushMode.html>and
also can set time interval
<https://kudu.apache.org/releases/1.2.0/apidocs/org/apache/kudu/client/KuduSession.html#setFlushInterval-int->
for automatic flush.*
*> Should theses behaviors be configurable using gora.properties file? or
just use the default configurations.*

What we do in HBase is configure an autoflush option in gora.properties [2]
which is used when instanced the Table, but at the same time we implement
the flush() method to force the flush [3]. I would suggest to follow that
example, but adding the flushing options of Kudu. What flushing mode (and
time interval if it applies) do you suggest?

*> Also, while reviewing the datastore interface I noticed this method
'getPartitions(Query<K, T> query)'. What is the expected behavior of this
method?, should I use the partition definition in the xml mapping file for
this?.*

The method getPartitions(Query) is related to Hadoop. Apache Gora
integrates with Hadoop implementing a custom Map and Reduce that allows to
get/write Entities directly.
You can take a look at HBase's implementation [4], which relies
o.a.h.hbase.mapreduce.TableInputFormatBase
[5] to compute the splits (start key---end key) with the location of the
split to create a colection of partitions [6].

So, if Kudu is allowed to perform computation using local kudu splits, then
this method does the needed preparation to allow to "send the computation
to where the data is locally".

In any case, you can see that:

   - MongoDB store implementation does not implement splitting [7]
   - Cassandra store implementation does not implement splitting [8]
   - Aerospike store implementation does not implement splitting [9]
   - Accumulo store implementation* does* implement splitting [10]

If Kudu has a method to get the different splits for a table and its
locations, then you will be able to implement the full feature.

This is Hadoop related and it is not trivial. I haven't elaborated much, so
if you find you need more information let me know :)


About Queries, what I can tell is that Hbase only implements "Start key" +
"End key" because it has only 2 operations: "get" and "scan", and the
querying is for "scan" operation, were you want an interval (or all) of the
rows. Does Kudu have more querying functionality?

About other topic, I am trying to install Kudu in standalone (all in 1
node). Do you use a Cloudera installation or do you have a standalone
installation? How do you do it? I found some instructions, but they talk
about compiling Kudu [11]. I was looking for something like HBase, that it
is unzip + execute "hbase start".


Good job and thank you!! :)

Regards,

Alfonso Nishikawa


[1] - https://avro.apache.org/docs/1.8.0/spec.html#Logical+Types
[2] -
https://github.com/apache/gora/blob/apache-gora-0.9/gora-hbase/src/main/java/org/apache/gora/hbase/store/HBaseStore.java#L175
[3] -
https://github.com/apache/gora/blob/apache-gora-0.9/gora-hbase/src/main/java/org/apache/gora/hbase/store/HBaseStore.java#L458
[4] -
https://github.com/apache/gora/blob/apache-gora-0.9/gora-hbase/src/main/java/org/apache/gora/hbase/store/HBaseStore.java#L472
[5] -
https://github.com/apache/gora/blob/apache-gora-0.9/gora-hbase/src/main/java/org/apache/gora/hbase/store/HBaseStore.java#L479
[6] -
https://github.com/apache/gora/blob/apache-gora-0.9/gora-hbase/src/main/java/org/apache/gora/hbase/store/HBaseStore.java#L517
[7] -
https://github.com/apache/gora/blob/apache-gora-0.9/gora-mongodb/src/main/java/org/apache/gora/mongodb/store/MongoStore.java#L533
[8] -
https://github.com/apache/gora/blob/apache-gora-0.9/gora-cassandra/src/main/java/org/apache/gora/cassandra/store/CassandraStore.java#L292
[9] -
https://github.com/apache/gora/blob/apache-gora-0.9/gora-aerospike/src/main/java/org/apache/gora/aerospike/store/AerospikeStore.java#L369
[10] -
https://github.com/apache/gora/blob/apache-gora-0.9/gora-accumulo/src/main/java/org/apache/gora/accumulo/store/AccumuloStore.java#L902
[11] - https://kudu.apache.org/docs/installation.html


El lun., 8 jul. 2019 a las 3:42, John Mora (<jhnmora...@gmail.com>)
escribió:

> Hi all.
>
> As every week I updated my report in the Wiki[1]. Also, I pushed my last
> commits to my branch [2]. Please give it a look if you have time.
>
> This week, I will be continue working in the Queries implementation,
> please reach me out if you have any suggestions.
>
> Also, while reviewing the datastore interface I noticed this method
> 'getPartitions(Query<K, T> query)'. What is the expected behavior of this
> method?, should I use the partition definition in the xml mapping file for
> this?.
>
> Cheers,
> John.
>
> [1]
> https://cwiki.apache.org/confluence/display/GORA/GORA-485+Apache+Kudu+datastore+for+Gora+Reports
> [2] https://github.com/jhnmora000/gora/tree/GORA-485
>
>
> El dom., 30 jun. 2019 a las 16:56, John Mora (<jhnmora...@gmail.com>)
> escribió:
>
>> Hi all.
>>
>> I received my first evaluation from the Google Summer of Code program
>> with a positive result. Thanks so much for your support and confidence to
>> the project and me.
>>
>> I updated my report of this week in the Wiki[1]. Also, I pushed my last
>> commits to my branch [2].
>>
>> This week, I will be reviewing my the serialization/ deserialization
>> process in order to identify optimizations specific for Kudu. Because I
>> used a generic methods of other backends which probably could be better
>> tuned for kudu. Also, I will start working on the Queries implementation.
>>
>> BTW, I added a question to the wiki about Date types. Please give it a
>> look if you have time.
>>
>> [1]
>> https://cwiki.apache.org/confluence/display/GORA/GORA-485+Apache+Kudu+datastore+for+Gora+Reports
>> [2] https://github.com/jhnmora000/gora/tree/GORA-485
>>
>> Cheers,
>> John
>>
>> El jue., 27 jun. 2019 a las 21:02, John Mora (<jhnmora...@gmail.com>)
>> escribió:
>>
>>> Hi Carlos.
>>>
>>> Thanks for the reminder. I submitted the form yesterday. :D
>>>
>>> Best,
>>> John.
>>>
>>> El jue., 27 jun. 2019 a las 17:34, carlos muñoz (<carlosr...@gmail.com>)
>>> escribió:
>>>
>>>> Hi John
>>>>
>>>> The first Google Summer of Code evaluation is due on June 28th. Please
>>>> make sure you submit your Mentors' evaluation on time.
>>>>
>>>> Regards,
>>>> Carlos
>>>>
>>>> El dom., 23 jun. 2019 a las 18:29, John Mora (<jhnmora...@gmail.com>)
>>>> escribió:
>>>>
>>>>> Hi all.
>>>>>
>>>>> FYI, I updated my report of this week on the Wiki[1]. Also, I pushed
>>>>> my last commits to my branch [2].
>>>>>
>>>>> As I mentioned in the reports I would like to know how datastores deal
>>>>> with flush(), should it work always manually executed?.
>>>>>
>>>>> Finally, This week I will be implementing object
>>>>> serialization/deserialization in the methods put, get, delete, exists. Do
>>>>> you have any suggestions on how to proceed with this task?.
>>>>>
>>>>> Footnote: Thanks for the feedback Carlos, I fixed the problem.
>>>>>
>>>>> [1]
>>>>> https://cwiki.apache.org/confluence/display/GORA/GORA-485+Apache+Kudu+datastore+for+Gora+Reports
>>>>> [2] https://github.com/jhnmora000/gora/tree/GORA-485
>>>>>
>>>>> Cheers,
>>>>> John
>>>>>
>>>>>
>>>>> El lun., 17 jun. 2019 a las 22:58, carlos muñoz (<carlosr...@gmail.com>)
>>>>> escribió:
>>>>>
>>>>>> Hi John
>>>>>>
>>>>>> Your last changes look good to me. Keep it up. But, I noticed that
>>>>>> you have created an Enumeration for datatypes, which is very similar to 
>>>>>> the
>>>>>> kudu-client's [2]. Probably you should replace [1] for [2] in order to
>>>>>> avoid code duplication.
>>>>>>
>>>>>> [1]
>>>>>> https://github.com/jhnmora000/gora/blob/GORA-485/gora-kudu/src/main/java/org/apache/gora/kudu/mapping/Column.java#L76
>>>>>> [2] https://kudu.apache.org/apidocs/org/apache/kudu/Type.html
>>>>>>
>>>>>>
>>>>>> Best,
>>>>>> Carlos
>>>>>>
>>>>>> El sáb., 15 jun. 2019 a las 12:01, John Mora (<jhnmora...@gmail.com>)
>>>>>> escribió:
>>>>>>
>>>>>>> Hi all.
>>>>>>>
>>>>>>> I updated my report of this week on the Wiki[1]. I noticed that my
>>>>>>> code is lacking some javadoc documentation I think I will be working on
>>>>>>> that this week, also I would like to enable and check schema management
>>>>>>> tests (createSchema, existsSchema, etc.).
>>>>>>>
>>>>>>> [1]
>>>>>>> https://cwiki.apache.org/confluence/display/GORA/GORA-485+Apache+Kudu+datastore+for+Gora+Reports
>>>>>>>
>>>>>>> Cheers,
>>>>>>> John.
>>>>>>>
>>>>>>>
>>>>>>> El mar., 11 jun. 2019 a las 0:11, John Mora (<jhnmora...@gmail.com>)
>>>>>>> escribió:
>>>>>>>
>>>>>>>> Hi Alfonso.
>>>>>>>>
>>>>>>>> Thanks so much for your feedback. I am working on your comments.
>>>>>>>>
>>>>>>>> Best,
>>>>>>>> John
>>>>>>>>
>>>>>>>> El lun., 10 jun. 2019 a las 16:11, Alfonso Nishikawa (<
>>>>>>>> alfonso.nishik...@gmail.com>) escribió:
>>>>>>>>
>>>>>>>>> Hi, John.
>>>>>>>>>
>>>>>>>>> Regarding your questions at the report [1]:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>    - How to represent partitioning configurations on the mapping
>>>>>>>>>    file.
>>>>>>>>>
>>>>>>>>> This was discussed in other emails, isn't it? :)
>>>>>>>>>
>>>>>>>>>    - KuduTestHarness requires the Maven plugin os-maven-plugin,
>>>>>>>>>    which needs Maven 3.1.1+, is it a problem for Apache Gora?
>>>>>>>>>
>>>>>>>>> I believe it is not a problem. My Ubuntu comes with 3.6.0, far
>>>>>>>>> from 3.1.1, and I assume everyone uses Maven 3 in a quite new version 
>>>>>>>>> :)
>>>>>>>>>
>>>>>>>>> [1] -
>>>>>>>>> https://cwiki.apache.org/confluence/display/GORA/GORA-485+Apache+Kudu+datastore+for+Gora+Reports
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Regards,
>>>>>>>>>
>>>>>>>>> Alfonso Nishikawa
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> El lun., 10 jun. 2019 a las 21:07, Alfonso Nishikawa (<
>>>>>>>>> alfonso.nishik...@gmail.com>) escribió:
>>>>>>>>>
>>>>>>>>>> Hi, John.
>>>>>>>>>>
>>>>>>>>>> Thank you!
>>>>>>>>>> Things I have seen:
>>>>>>>>>>
>>>>>>>>>> - The version of a maven dependency [1] should go on the
>>>>>>>>>> Dependency Management of the root pom [2]. Same for [3] and from 
>>>>>>>>>> there,
>>>>>>>>>> should not set the version there.
>>>>>>>>>> - Set test dependencies' scope to test, at [4] and from there.
>>>>>>>>>> - Set the indentation to 2 spaces for the pom [5]
>>>>>>>>>> - Missing "t" in "localhost" at [6].
>>>>>>>>>> - Port 13 for Kudu? That is "Daytime Protocol" RFC 867 and you
>>>>>>>>>> will need root permission to run it. The default port for kudu is 
>>>>>>>>>> 7051,
>>>>>>>>>> isn't it?
>>>>>>>>>> - I would ask you to add the same functionality to load the
>>>>>>>>>> mapping from configuration as in HBase's store [7] in you KuduStore 
>>>>>>>>>> [8].
>>>>>>>>>> This will have implications on your readMapping at [9], so take a 
>>>>>>>>>> look at
>>>>>>>>>> the one for HBase at [10]
>>>>>>>>>> - I know it is in other backends, but avoid RuntimeExceptions (at
>>>>>>>>>> least in Java since we have the checked ones) like in [11]. You can 
>>>>>>>>>> wrap
>>>>>>>>>> them in GoraException. An example is [12]
>>>>>>>>>>
>>>>>>>>>> And nothing more :)
>>>>>>>>>> Keep going, good job.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> [1] -
>>>>>>>>>> https://github.com/jhnmora000/gora/blob/GORA-485/gora-kudu/pom.xml#L98
>>>>>>>>>> [2] -
>>>>>>>>>> https://github.com/jhnmora000/gora/blob/GORA-485/pom.xml#L890
>>>>>>>>>> [3] -
>>>>>>>>>> https://github.com/jhnmora000/gora/blob/GORA-485/gora-kudu/pom.xml#L121
>>>>>>>>>> [4] -
>>>>>>>>>> https://github.com/jhnmora000/gora/blob/GORA-485/gora-kudu/pom.xml#L180
>>>>>>>>>> [5] -
>>>>>>>>>> https://github.com/jhnmora000/gora/blob/GORA-485/gora-kudu/pom.xml
>>>>>>>>>> [6] -
>>>>>>>>>> https://github.com/jhnmora000/gora/blob/GORA-485/gora-kudu/src/test/resources/gora.properties#L18
>>>>>>>>>> [7] -
>>>>>>>>>> https://github.com/jhnmora000/gora/blob/master/gora-hbase/src/main/java/org/apache/gora/hbase/store/HBaseStore.java#L92
>>>>>>>>>> [8] -
>>>>>>>>>> https://github.com/jhnmora000/gora/blob/GORA-485/gora-kudu/src/main/java/org/apache/gora/kudu/store/KuduStore.java#L53
>>>>>>>>>> [9] -
>>>>>>>>>> https://github.com/jhnmora000/gora/blob/GORA-485/gora-kudu/src/main/java/org/apache/gora/kudu/mapping/KuduMappingBuilder.java#L81
>>>>>>>>>> [10] -
>>>>>>>>>> https://github.com/jhnmora000/gora/blob/master/gora-hbase/src/main/java/org/apache/gora/hbase/store/HBaseStore.java#L822
>>>>>>>>>> [11] -
>>>>>>>>>> https://github.com/jhnmora000/gora/blob/GORA-485/gora-kudu/src/main/java/org/apache/gora/kudu/mapping/KuduMappingBuilder.java#L141
>>>>>>>>>> [12] -
>>>>>>>>>> https://github.com/jhnmora000/gora/blob/master/gora-hbase/src/main/java/org/apache/gora/hbase/store/HBaseStore.java#L268
>>>>>>>>>>
>>>>>>>>>> Regards,
>>>>>>>>>>
>>>>>>>>>> Alfonso Nishikawa
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> El sáb., 8 jun. 2019 a las 20:26, John Mora (<
>>>>>>>>>> jhnmora...@gmail.com>) escribió:
>>>>>>>>>>
>>>>>>>>>>> Hi all.
>>>>>>>>>>>
>>>>>>>>>>> I have just updated my weekly reports on Cwiki [1]. This next
>>>>>>>>>>> week I think I should be focusing on the create schema operation and
>>>>>>>>>>> solving the issue of the partitioning configurations in the mapping 
>>>>>>>>>>> file.
>>>>>>>>>>>
>>>>>>>>>>> Please let me know if you have suggestions, my last commits are
>>>>>>>>>>> available here [2]
>>>>>>>>>>>
>>>>>>>>>>> [1]
>>>>>>>>>>> https://cwiki.apache.org/confluence/display/GORA/GORA-485+Apache+Kudu+datastore+for+Gora+Reports
>>>>>>>>>>> [2] https://github.com/jhnmora000/gora/tree/GORA-485
>>>>>>>>>>>
>>>>>>>>>>> Best,
>>>>>>>>>>> John
>>>>>>>>>>>
>>>>>>>>>>>

Reply via email to