Hi John, Thank you for your answers.
1) The type of the Elasticsearch "_id" field is string. I am not sure that will fix the problem if I just copy the "_id" field contents as "_id" can still be an arbitrary string value (i.e. not necessarily an integer). 2) Elasticsearch does not support partitioning, so I will leave the single partition implementation. Regards, Maria On Tue, 16 Feb 2021 at 09:14, John Mora <jhnmora...@gmail.com> wrote: > Hi Maria, > > Thanks for the update. > > 1) I think you can copy the content from _id to a manually created field > let's say 'gora_id' using copy_to. > > > https://www.elastic.co/guide/en/elasticsearch/reference/current/copy-to.html > > But, I have not try it yet, I am not sure if this will work. > > Alternatively, you can manually copy the value of the key to a field that > can be range queried in the put method of the datastore. > > 2) In some databases you can split your data into partitions, generally > defining ranges for the primary key. > > Kudu is an example of this: > https://kudu.apache.org/docs/schema_design.html#range-partitioning > > In this case, the getPartitions should split a query using the existing > partition ranges: > Kudu example: > > https://github.com/apache/gora/blob/master/gora-kudu/src/main/java/org/apache/gora/kudu/store/KuduStore.java#L383 > > If the database does not support partitioning this method only return a > single partition (the whole table/collection). > This is probably the implementation that you saw. > > I think Elasticsearch does not support partitioning, in that case your > implementation is fine, but I am not an expert in Elasticsearch. > > Best, > John > > El sáb, 13 feb 2021 a las 0:15, Maria Podorvanova (< > podorvanova.ma...@gmail.com>) escribió: > >> Hi, >> >> Report #10 >> Week 10: January, 7 - February, 13 >> Activities: >> - Implemented newQuery method >> - Implemented deleteByQuery method >> - Used an Enum instead of literal strings for the Authentication Type >> parameter >> - Used parameterized logging instead of string concatenation >> - Implemented execute method >> - Implemented getPartitions method >> - The following tests are passing now: >> >> 1. testTruncateSchema >> 2. testDeleteSchema >> 3. testQueryWebPageQueryEmptyResults >> 4. testResultSize >> 5. testResultSizeStartKey >> 6. testResultSizeEndKey >> 7. testResultSizeWithLimit >> 8. testResultSizeStartKeyWithLimit >> 9. testResultSizeEndKeyWithLimit >> 10. testResultSizeKeyRangeWithLimit >> >> - Filled out and sent Outreachy internship feedback to Apache >> >> Here is the link to my code: >> https://github.com/apache/gora/compare/master...podorvanova:gora-664. >> Relevant commits are from February 10. >> >> Questions: >> >> 1. This week I worked on query functionalities implementation. While >> testing I found that Elasticsearch "_id" field does not support range >> queries, which are required for deleteByQuery method. So I am a little >> confused about what I should do in this case. >> 2. I roughly understand that getPartitions method is needed to >> implement the Hadoop support. I looked through other modules and found >> that >> the method is implemented the same way everywhere, so I did the same for >> now. Could you tell me more about this method or maybe provide some >> resources? >> >> >> Regards, >> Maria >> >