Re: Add datastore for Elasticsearch. Outreachy Week 10 Report

2021-02-15 Thread John Mora
Hi Maria,

Thanks for the update.

1) I think you can copy the content from _id to a manually created field
let's say 'gora_id' using copy_to.

https://www.elastic.co/guide/en/elasticsearch/reference/current/copy-to.html

But, I have not try it yet, I am not sure if this will work.

Alternatively, you can manually copy the value of the key to a field that
can be range queried in the put method of the datastore.

2) In some databases you can split your data into partitions, generally
defining ranges for the primary key.

Kudu is an example of this:
https://kudu.apache.org/docs/schema_design.html#range-partitioning

In this case, the getPartitions should split a query using the existing
partition ranges:
Kudu example:
https://github.com/apache/gora/blob/master/gora-kudu/src/main/java/org/apache/gora/kudu/store/KuduStore.java#L383

If the database does not support partitioning this method only return a
single partition (the whole table/collection).
This is probably the implementation that you saw.

I think Elasticsearch does not support partitioning, in that case your
implementation is fine, but I am not an expert in Elasticsearch.

Best,
John

El sáb, 13 feb 2021 a las 0:15, Maria Podorvanova (<
podorvanova.ma...@gmail.com>) escribió:

> Hi,
>
> Report #10
> Week 10: January, 7 - February, 13
> Activities:
> - Implemented newQuery method
> - Implemented deleteByQuery method
> - Used an Enum instead of literal strings for the Authentication Type
> parameter
> - Used parameterized logging instead of string concatenation
> - Implemented execute method
> - Implemented getPartitions method
> - The following tests are passing now:
>
>1. testTruncateSchema
>2. testDeleteSchema
>3. testQueryWebPageQueryEmptyResults
>4. testResultSize
>5. testResultSizeStartKey
>6. testResultSizeEndKey
>7. testResultSizeWithLimit
>8. testResultSizeStartKeyWithLimit
>9. testResultSizeEndKeyWithLimit
>10. testResultSizeKeyRangeWithLimit
>
> - Filled out and sent Outreachy internship feedback to Apache
>
> Here is the link to my code:
> https://github.com/apache/gora/compare/master...podorvanova:gora-664.
> Relevant commits are from February 10.
>
> Questions:
>
>1. This week I worked on query functionalities implementation. While
>testing I found that Elasticsearch "_id" field does not support range
>queries, which are required for deleteByQuery method. So I am a little
>confused about what I should do in this case.
>2. I roughly understand that getPartitions method is needed to
>implement the Hadoop support. I looked through other modules and found that
>the method is implemented the same way everywhere, so I did the same for
>now. Could you tell me more about this method or maybe provide some
>resources?
>
>
> Regards,
> Maria
>


Re: Add datastore for Elasticsearch. Outreachy Week 10 Report

2021-02-15 Thread Maria Podorvanova
Hi John,

Thank you for your answers.

1) The type of the Elasticsearch "_id" field is string. I am not sure that
will fix the problem if I just copy the "_id" field contents as "_id" can
still be an arbitrary string value (i.e. not necessarily an integer).

2) Elasticsearch does not support partitioning, so I will leave the single
partition implementation.

Regards,
Maria

On Tue, 16 Feb 2021 at 09:14, John Mora  wrote:

> Hi Maria,
>
> Thanks for the update.
>
> 1) I think you can copy the content from _id to a manually created field
> let's say 'gora_id' using copy_to.
>
>
> https://www.elastic.co/guide/en/elasticsearch/reference/current/copy-to.html
>
> But, I have not try it yet, I am not sure if this will work.
>
> Alternatively, you can manually copy the value of the key to a field that
> can be range queried in the put method of the datastore.
>
> 2) In some databases you can split your data into partitions, generally
> defining ranges for the primary key.
>
> Kudu is an example of this:
> https://kudu.apache.org/docs/schema_design.html#range-partitioning
>
> In this case, the getPartitions should split a query using the existing
> partition ranges:
> Kudu example:
>
> https://github.com/apache/gora/blob/master/gora-kudu/src/main/java/org/apache/gora/kudu/store/KuduStore.java#L383
>
> If the database does not support partitioning this method only return a
> single partition (the whole table/collection).
> This is probably the implementation that you saw.
>
> I think Elasticsearch does not support partitioning, in that case your
> implementation is fine, but I am not an expert in Elasticsearch.
>
> Best,
> John
>
> El sáb, 13 feb 2021 a las 0:15, Maria Podorvanova (<
> podorvanova.ma...@gmail.com>) escribió:
>
>> Hi,
>>
>> Report #10
>> Week 10: January, 7 - February, 13
>> Activities:
>> - Implemented newQuery method
>> - Implemented deleteByQuery method
>> - Used an Enum instead of literal strings for the Authentication Type
>> parameter
>> - Used parameterized logging instead of string concatenation
>> - Implemented execute method
>> - Implemented getPartitions method
>> - The following tests are passing now:
>>
>>1. testTruncateSchema
>>2. testDeleteSchema
>>3. testQueryWebPageQueryEmptyResults
>>4. testResultSize
>>5. testResultSizeStartKey
>>6. testResultSizeEndKey
>>7. testResultSizeWithLimit
>>8. testResultSizeStartKeyWithLimit
>>9. testResultSizeEndKeyWithLimit
>>10. testResultSizeKeyRangeWithLimit
>>
>> - Filled out and sent Outreachy internship feedback to Apache
>>
>> Here is the link to my code:
>> https://github.com/apache/gora/compare/master...podorvanova:gora-664.
>> Relevant commits are from February 10.
>>
>> Questions:
>>
>>1. This week I worked on query functionalities implementation. While
>>testing I found that Elasticsearch "_id" field does not support range
>>queries, which are required for deleteByQuery method. So I am a little
>>confused about what I should do in this case.
>>2. I roughly understand that getPartitions method is needed to
>>implement the Hadoop support. I looked through other modules and found 
>> that
>>the method is implemented the same way everywhere, so I did the same for
>>now. Could you tell me more about this method or maybe provide some
>>resources?
>>
>>
>> Regards,
>> Maria
>>
>


Re: Outreachy 2020-2021 - Neo4j - Weekly reports.

2021-02-15 Thread John Mora
Hi Gaby

Thanks for the update.

Overall the code looks good, I do not have specific feedback for you this
week.

According to your proposed timeline you should start working on the Query
features, let's do it. Let me know if you have questions.


Thanks,
John

El sáb, 13 feb 2021 a las 0:57, gabriela ortiz ()
escribió:

> Hi all.
>
> I wanted to inform the tasks I worked on this week: Feb 06 - Feb 12 .
>
> * Enhance variable names.
> * Add enum for neo4j protocols.
> * Enhance getUnionSchema method for Maps.
> * Implement partitons.
> * Activate tests:
>   testUpdate
>   testGetRecursive
>   testGetDoubleRecursive
>   testGetWebPage
>   testGetWebPageDefaultFields
>
> Also, I started working on my C.V.
>
> My code is here: https://github.com/mgov88/gora/tree/GORA-663
>
> Regards,
> Gaby
>
> El mié, 10 de feb. de 2021 a la(s) 21:33, gabriela ortiz (
> arqgabyor...@gmail.com) escribió:
>
>> Hi John.
>>
>> Thanks for the feedback I will work on your comments.
>>
>> Regards,
>> Gaby
>>
>>
>> El mié, 10 de feb. de 2021 a la(s) 12:04, John Mora (jhnmora...@gmail.com)
>> escribió:
>>
>>> Hi Gaby
>>>
>>> Thanks for the update.
>>>
>>> BTW, I am sorry that I did not provide feedback on your code last week,
>>> I have been busy.
>>>
>>> Some comments:
>>>
>>> Please use more descriptive variable names:
>>>
>>>
>>> https://github.com/mgov88/gora/blob/GORA-663/gora-neo4j/src/main/java/org/apache/gora/neo4j/store/Neo4jStore.java#L368
>>>
>>> https://github.com/mgov88/gora/blob/GORA-663/gora-neo4j/src/main/java/org/apache/gora/neo4j/store/Neo4jStore.java#L165
>>>
>>> https://github.com/mgov88/gora/blob/GORA-663/gora-neo4j/src/main/java/org/apache/gora/neo4j/store/Neo4jStore.java#L171
>>>
>>> https://github.com/mgov88/gora/blob/GORA-663/gora-neo4j/src/main/java/org/apache/gora/neo4j/store/Neo4jStore.java#L193
>>>
>>> https://github.com/mgov88/gora/blob/GORA-663/gora-neo4j/src/main/java/org/apache/gora/neo4j/store/Neo4jStore.java#L194
>>>
>>> https://github.com/mgov88/gora/blob/GORA-663/gora-neo4j/src/main/java/org/apache/gora/neo4j/store/Neo4jStore.java#L200
>>>
>>> https://github.com/mgov88/gora/blob/GORA-663/gora-neo4j/src/main/java/org/apache/gora/neo4j/store/Neo4jStore.java#L206
>>>
>>> https://github.com/mgov88/gora/blob/GORA-663/gora-neo4j/src/main/java/org/apache/gora/neo4j/store/Neo4jStore.java#L216
>>>
>>> https://github.com/mgov88/gora/blob/GORA-663/gora-neo4j/src/main/java/org/apache/gora/neo4j/store/Neo4jStore.java#L216
>>>
>>> Typo:
>>>
>>> https://github.com/mgov88/gora/blob/GORA-663/gora-neo4j/src/main/java/org/apache/gora/neo4j/store/Neo4jStore.java#L216
>>>
>>> Avoid string concatenation:
>>>
>>> https://github.com/mgov88/gora/blob/GORA-663/gora-neo4j/src/main/java/org/apache/gora/neo4j/store/Neo4jStore.java#L307
>>>
>>> Use an Enum instead of string literals,
>>>
>>> https://github.com/mgov88/gora/blob/GORA-663/gora-neo4j/src/main/java/org/apache/gora/neo4j/store/Neo4jStore.java#L129
>>>
>>>
>>> regards
>>> John
>>>
>>> El lun, 8 feb 2021 a las 2:08, gabriela ortiz ()
>>> escribió:
>>>
 Hi all.

 I wanted to inform the tasks I worked on this week: Jan 30 - Feb 05 .

 * Enhance the deleteSchema method (delete existing nodes when deleting
 the schema constraints)
 * Enhance Map, Record, Array and Bytes serialization / de-serialization
 process using Base64 encoding.
 * Activate tests:
 testPutNested
 testPutArray
 testPutBytes
 testPutMap
 testPutMixedMaps
 testGetNested
 testGet3UnionField
 testGetWithFields

 My code is here: https://github.com/mgov88/gora/tree/GORA-663

 Regards,
 Gaby

 El lun, 1 de feb. de 2021 a la(s) 01:53, gabriela ortiz (
 arqgabyor...@gmail.com) escribió:

> Hi all.
>
> I wanted to inform the tasks I worked on this week: Jan 17 - Jan 29 .
>
> * Add suggested javadocs.
> * Add suggested constants.
> * Make EXIST constraints optional.
> * Activate tests:
> testNewInstance
> testAutoCreateSchema
> testPut
> testBenchmarkExists
> testGetNonExisting
> testObjectFieldValue
> * Write a blog (Career opportunities)
>
> My code is here: https://github.com/mgov88/gora/tree/GORA-663
>
> Regards,
> Gaby
>
>
> El mié, 27 de ene. de 2021 a la(s) 12:00, John Mora (
> jhnmora...@gmail.com) escribió:
>
>> Hi Gaby
>>
>> Thanks for your report.
>>
>> Some comments:
>>
>> Please use constants instead of literal values here:
>>
>> https://github.com/mgov88/gora/blob/GORA-663/gora-neo4j/src/main/java/org/apache/gora/neo4j/store/Neo4jStore.java#L204
>>
>> https://github.com/mgov88/gora/blob/GORA-663/gora-neo4j/src/main/java/org/apache/gora/neo4j/store/Neo4jStore.java#L224
>>
>> https://github.com/mgov88/gora/blob/GORA-663/gora-neo4j/src/main/java/org/apache/gora/neo4j/store/Neo4jStore.java#L251