Re: Add datastore for Elasticsearch. Outreachy Week 10 Report
Hi John, Thank you for your answers. 1) The type of the Elasticsearch "_id" field is string. I am not sure that will fix the problem if I just copy the "_id" field contents as "_id" can still be an arbitrary string value (i.e. not necessarily an integer). 2) Elasticsearch does not support partitioning, so I will leave the single partition implementation. Regards, Maria On Tue, 16 Feb 2021 at 09:14, John Mora wrote: > Hi Maria, > > Thanks for the update. > > 1) I think you can copy the content from _id to a manually created field > let's say 'gora_id' using copy_to. > > > https://www.elastic.co/guide/en/elasticsearch/reference/current/copy-to.html > > But, I have not try it yet, I am not sure if this will work. > > Alternatively, you can manually copy the value of the key to a field that > can be range queried in the put method of the datastore. > > 2) In some databases you can split your data into partitions, generally > defining ranges for the primary key. > > Kudu is an example of this: > https://kudu.apache.org/docs/schema_design.html#range-partitioning > > In this case, the getPartitions should split a query using the existing > partition ranges: > Kudu example: > > https://github.com/apache/gora/blob/master/gora-kudu/src/main/java/org/apache/gora/kudu/store/KuduStore.java#L383 > > If the database does not support partitioning this method only return a > single partition (the whole table/collection). > This is probably the implementation that you saw. > > I think Elasticsearch does not support partitioning, in that case your > implementation is fine, but I am not an expert in Elasticsearch. > > Best, > John > > El sáb, 13 feb 2021 a las 0:15, Maria Podorvanova (< > podorvanova.ma...@gmail.com>) escribió: > >> Hi, >> >> Report #10 >> Week 10: January, 7 - February, 13 >> Activities: >> - Implemented newQuery method >> - Implemented deleteByQuery method >> - Used an Enum instead of literal strings for the Authentication Type >> parameter >> - Used parameterized logging instead of string concatenation >> - Implemented execute method >> - Implemented getPartitions method >> - The following tests are passing now: >> >>1. testTruncateSchema >>2. testDeleteSchema >>3. testQueryWebPageQueryEmptyResults >>4. testResultSize >>5. testResultSizeStartKey >>6. testResultSizeEndKey >>7. testResultSizeWithLimit >>8. testResultSizeStartKeyWithLimit >>9. testResultSizeEndKeyWithLimit >>10. testResultSizeKeyRangeWithLimit >> >> - Filled out and sent Outreachy internship feedback to Apache >> >> Here is the link to my code: >> https://github.com/apache/gora/compare/master...podorvanova:gora-664. >> Relevant commits are from February 10. >> >> Questions: >> >>1. This week I worked on query functionalities implementation. While >>testing I found that Elasticsearch "_id" field does not support range >>queries, which are required for deleteByQuery method. So I am a little >>confused about what I should do in this case. >>2. I roughly understand that getPartitions method is needed to >>implement the Hadoop support. I looked through other modules and found >> that >>the method is implemented the same way everywhere, so I did the same for >>now. Could you tell me more about this method or maybe provide some >>resources? >> >> >> Regards, >> Maria >> >
Re: Add datastore for Elasticsearch. Outreachy Week 10 Report
Hi Maria, Thanks for the update. 1) I think you can copy the content from _id to a manually created field let's say 'gora_id' using copy_to. https://www.elastic.co/guide/en/elasticsearch/reference/current/copy-to.html But, I have not try it yet, I am not sure if this will work. Alternatively, you can manually copy the value of the key to a field that can be range queried in the put method of the datastore. 2) In some databases you can split your data into partitions, generally defining ranges for the primary key. Kudu is an example of this: https://kudu.apache.org/docs/schema_design.html#range-partitioning In this case, the getPartitions should split a query using the existing partition ranges: Kudu example: https://github.com/apache/gora/blob/master/gora-kudu/src/main/java/org/apache/gora/kudu/store/KuduStore.java#L383 If the database does not support partitioning this method only return a single partition (the whole table/collection). This is probably the implementation that you saw. I think Elasticsearch does not support partitioning, in that case your implementation is fine, but I am not an expert in Elasticsearch. Best, John El sáb, 13 feb 2021 a las 0:15, Maria Podorvanova (< podorvanova.ma...@gmail.com>) escribió: > Hi, > > Report #10 > Week 10: January, 7 - February, 13 > Activities: > - Implemented newQuery method > - Implemented deleteByQuery method > - Used an Enum instead of literal strings for the Authentication Type > parameter > - Used parameterized logging instead of string concatenation > - Implemented execute method > - Implemented getPartitions method > - The following tests are passing now: > >1. testTruncateSchema >2. testDeleteSchema >3. testQueryWebPageQueryEmptyResults >4. testResultSize >5. testResultSizeStartKey >6. testResultSizeEndKey >7. testResultSizeWithLimit >8. testResultSizeStartKeyWithLimit >9. testResultSizeEndKeyWithLimit >10. testResultSizeKeyRangeWithLimit > > - Filled out and sent Outreachy internship feedback to Apache > > Here is the link to my code: > https://github.com/apache/gora/compare/master...podorvanova:gora-664. > Relevant commits are from February 10. > > Questions: > >1. This week I worked on query functionalities implementation. While >testing I found that Elasticsearch "_id" field does not support range >queries, which are required for deleteByQuery method. So I am a little >confused about what I should do in this case. >2. I roughly understand that getPartitions method is needed to >implement the Hadoop support. I looked through other modules and found that >the method is implemented the same way everywhere, so I did the same for >now. Could you tell me more about this method or maybe provide some >resources? > > > Regards, > Maria >
Add datastore for Elasticsearch. Outreachy Week 10 Report
Hi, Report #10 Week 10: January, 7 - February, 13 Activities: - Implemented newQuery method - Implemented deleteByQuery method - Used an Enum instead of literal strings for the Authentication Type parameter - Used parameterized logging instead of string concatenation - Implemented execute method - Implemented getPartitions method - The following tests are passing now: 1. testTruncateSchema 2. testDeleteSchema 3. testQueryWebPageQueryEmptyResults 4. testResultSize 5. testResultSizeStartKey 6. testResultSizeEndKey 7. testResultSizeWithLimit 8. testResultSizeStartKeyWithLimit 9. testResultSizeEndKeyWithLimit 10. testResultSizeKeyRangeWithLimit - Filled out and sent Outreachy internship feedback to Apache Here is the link to my code: https://github.com/apache/gora/compare/master...podorvanova:gora-664. Relevant commits are from February 10. Questions: 1. This week I worked on query functionalities implementation. While testing I found that Elasticsearch "_id" field does not support range queries, which are required for deleteByQuery method. So I am a little confused about what I should do in this case. 2. I roughly understand that getPartitions method is needed to implement the Hadoop support. I looked through other modules and found that the method is implemented the same way everywhere, so I did the same for now. Could you tell me more about this method or maybe provide some resources? Regards, Maria