Hi John,

Thank you for your answers.

1) The type of the Elasticsearch "_id" field is string. I am not sure that
will fix the problem if I just copy the "_id" field contents as "_id" can
still be an arbitrary string value (i.e. not necessarily an integer).

2) Elasticsearch does not support partitioning, so I will leave the single
partition implementation.

Regards,
Maria

On Tue, 16 Feb 2021 at 09:14, John Mora <jhnmora...@gmail.com> wrote:

> Hi Maria,
>
> Thanks for the update.
>
> 1) I think you can copy the content from _id to a manually created field
> let's say 'gora_id' using copy_to.
>
>
> https://www.elastic.co/guide/en/elasticsearch/reference/current/copy-to.html
>
> But, I have not try it yet, I am not sure if this will work.
>
> Alternatively, you can manually copy the value of the key to a field that
> can be range queried in the put method of the datastore.
>
> 2) In some databases you can split your data into partitions, generally
> defining ranges for the primary key.
>
> Kudu is an example of this:
> https://kudu.apache.org/docs/schema_design.html#range-partitioning
>
> In this case, the getPartitions should split a query using the existing
> partition ranges:
> Kudu example:
>
> https://github.com/apache/gora/blob/master/gora-kudu/src/main/java/org/apache/gora/kudu/store/KuduStore.java#L383
>
> If the database does not support partitioning this method only return a
> single partition (the whole table/collection).
> This is probably the implementation that you saw.
>
> I think Elasticsearch does not support partitioning, in that case your
> implementation is fine, but I am not an expert in Elasticsearch.
>
> Best,
> John
>
> El sáb, 13 feb 2021 a las 0:15, Maria Podorvanova (<
> podorvanova.ma...@gmail.com>) escribió:
>
>> Hi,
>>
>> Report #10
>> Week 10: January, 7 - February, 13
>> Activities:
>> - Implemented newQuery method
>> - Implemented deleteByQuery method
>> - Used an Enum instead of literal strings for the Authentication Type
>> parameter
>> - Used parameterized logging instead of string concatenation
>> - Implemented execute method
>> - Implemented getPartitions method
>> - The following tests are passing now:
>>
>>    1. testTruncateSchema
>>    2. testDeleteSchema
>>    3. testQueryWebPageQueryEmptyResults
>>    4. testResultSize
>>    5. testResultSizeStartKey
>>    6. testResultSizeEndKey
>>    7. testResultSizeWithLimit
>>    8. testResultSizeStartKeyWithLimit
>>    9. testResultSizeEndKeyWithLimit
>>    10. testResultSizeKeyRangeWithLimit
>>
>> - Filled out and sent Outreachy internship feedback to Apache
>>
>> Here is the link to my code:
>> https://github.com/apache/gora/compare/master...podorvanova:gora-664.
>> Relevant commits are from February 10.
>>
>> Questions:
>>
>>    1. This week I worked on query functionalities implementation. While
>>    testing I found that Elasticsearch "_id" field does not support range
>>    queries, which are required for deleteByQuery method. So I am a little
>>    confused about what I should do in this case.
>>    2. I roughly understand that getPartitions method is needed to
>>    implement the Hadoop support. I looked through other modules and found 
>> that
>>    the method is implemented the same way everywhere, so I did the same for
>>    now. Could you tell me more about this method or maybe provide some
>>    resources?
>>
>>
>> Regards,
>> Maria
>>
>

Reply via email to