[jira] [Commented] (BEAM-3947) Add support for Solr 6.x/7.x
[ https://issues.apache.org/jira/browse/BEAM-3947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16468539#comment-16468539 ] Cao Manh Dat commented on BEAM-3947: {quote}What do you think about the idea of structuring the {{SolrIO}} so there is one built specifically for Solr7 that we can then focus on bringing in new features / performance gains from now on? {quote} I like this idea as well as the idea of scope {{solr-solrj}} as provided. What about creating a new one that support only Solr 6 and 7? > Add support for Solr 6.x/7.x > > > Key: BEAM-3947 > URL: https://issues.apache.org/jira/browse/BEAM-3947 > Project: Beam > Issue Type: Improvement > Components: io-java-solr >Reporter: Ismaël Mejía >Assignee: Cao Manh Dat >Priority: Minor > > The initial PR on Solr was based on Solr 6.x, however at that time we > supported Java 7 so Insisted to move it to Solr 5.x (which is Java 7 > compatible). This issue is to add support for multiple versions of Solr > ideally in a single module. > Notice that I was able to recover the original code for Solr 6.x created by > [~caomanhdat] here (there are some differences in the way the Split was > calculated and maybe some other minor things):( > [https://github.com/iemejia/beam/blob/recover-solrio/sdks/java/io/solr/pom.xml] > This issue does not cover support for Solr 7, but if it is possible to add it > as part of it, it would be great. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-3947) Add support for Solr 6.x/7.x
[ https://issues.apache.org/jira/browse/BEAM-3947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16468518#comment-16468518 ] Cao Manh Dat commented on BEAM-3947: In : * solr-solrj-5.x : {{method1}} is used to create a new client * solr-solrj-6.x: a new method is introduced, {{method2}} is used to create a new client, {{method1}} is deprecated. * solr-solrj-7.x: {{method1}} is removed. I kinda afraid that if scope {{solr-solrj}} as provided when users provide a solr-solrj-7.x they will meet {{NoSuchMethodError}}, (because {{method1}} is not available). > Add support for Solr 6.x/7.x > > > Key: BEAM-3947 > URL: https://issues.apache.org/jira/browse/BEAM-3947 > Project: Beam > Issue Type: Improvement > Components: io-java-solr >Reporter: Ismaël Mejía >Assignee: Cao Manh Dat >Priority: Minor > > The initial PR on Solr was based on Solr 6.x, however at that time we > supported Java 7 so Insisted to move it to Solr 5.x (which is Java 7 > compatible). This issue is to add support for multiple versions of Solr > ideally in a single module. > Notice that I was able to recover the original code for Solr 6.x created by > [~caomanhdat] here (there are some differences in the way the Split was > calculated and maybe some other minor things):( > [https://github.com/iemejia/beam/blob/recover-solrio/sdks/java/io/solr/pom.xml] > This issue does not cover support for Solr 7, but if it is possible to add it > as part of it, it would be great. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (BEAM-3947) Add support for Solr 6.x/7.x
[ https://issues.apache.org/jira/browse/BEAM-3947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16467496#comment-16467496 ] Cao Manh Dat edited comment on BEAM-3947 at 5/8/18 2:46 PM: {quote} Cao Manh Dat I am probably overlooking something simple but doesn't elasticsearch-tests have a separate module per version which only really differ in the test scope dependencies? https://github.com/apache/beam/tree/master/sdks/java/io/elasticsearch-tests {quote} I think we will meet NoSuchMethodError if we use this approach to test for Solr 7.x. {quote}Isn't the problem not the tests but that we (may) need different builds of the SolrIO to support API changes across versions of the SolrJ dependency? I gather that is a viable option for IO design when necessary. {quote} The point here is we can use the current SolrIO (which use Solrj 5.x) to fetching/writing data to a Solr cluster run on 5.x, 6.x, and 7.x. So I don't think we need to update the SolrJ dependency. was (Author: caomanhdat): {quote}Isn't the problem not the tests but that we (may) need different builds of the SolrIO to support API changes across versions of the SolrJ dependency? I gather that is a viable option for IO design when necessary. {quote} The point here is we can use the current SolrIO (which use Solrj 5.x) to fetching/writing data to a Solr cluster run on 5.x, 6.x, and 7.x. So I don't think we need to update the SolrJ dependency. > Add support for Solr 6.x/7.x > > > Key: BEAM-3947 > URL: https://issues.apache.org/jira/browse/BEAM-3947 > Project: Beam > Issue Type: Improvement > Components: io-java-solr >Reporter: Ismaël Mejía >Assignee: Cao Manh Dat >Priority: Minor > > The initial PR on Solr was based on Solr 6.x, however at that time we > supported Java 7 so Insisted to move it to Solr 5.x (which is Java 7 > compatible). This issue is to add support for multiple versions of Solr > ideally in a single module. > Notice that I was able to recover the original code for Solr 6.x created by > [~caomanhdat] here (there are some differences in the way the Split was > calculated and maybe some other minor things):( > [https://github.com/iemejia/beam/blob/recover-solrio/sdks/java/io/solr/pom.xml] > This issue does not cover support for Solr 7, but if it is possible to add it > as part of it, it would be great. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-3947) Add support for Solr 6.x/7.x
[ https://issues.apache.org/jira/browse/BEAM-3947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16467496#comment-16467496 ] Cao Manh Dat commented on BEAM-3947: {quote}Isn't the problem not the tests but that we (may) need different builds of the SolrIO to support API changes across versions of the SolrJ dependency? I gather that is a viable option for IO design when necessary. {quote} The point here is we can use the current SolrIO (which use Solrj 5.x) to fetching/writing data to a Solr cluster run on 5.x, 6.x, and 7.x. So I don't think we need to update the SolrJ dependency. > Add support for Solr 6.x/7.x > > > Key: BEAM-3947 > URL: https://issues.apache.org/jira/browse/BEAM-3947 > Project: Beam > Issue Type: Improvement > Components: io-java-solr >Reporter: Ismaël Mejía >Assignee: Cao Manh Dat >Priority: Minor > > The initial PR on Solr was based on Solr 6.x, however at that time we > supported Java 7 so Insisted to move it to Solr 5.x (which is Java 7 > compatible). This issue is to add support for multiple versions of Solr > ideally in a single module. > Notice that I was able to recover the original code for Solr 6.x created by > [~caomanhdat] here (there are some differences in the way the Split was > calculated and maybe some other minor things):( > [https://github.com/iemejia/beam/blob/recover-solrio/sdks/java/io/solr/pom.xml] > This issue does not cover support for Solr 7, but if it is possible to add it > as part of it, it would be great. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-3947) Add support for Solr 6.x/7.x
[ https://issues.apache.org/jira/browse/BEAM-3947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16467427#comment-16467427 ] Cao Manh Dat commented on BEAM-3947: After putting a lot of thought on this issue. I think the best strategy is Beam should support as many version of Solr as possible (5.x, 6.x 7.x). But the problems arise on how we can test it? In SolrIOTest.java we can only test one version of Solr (currently Solr 5) because the cluster started on the test is in the same JVM as the SolrIO. We can not use the same strategy as ElasticsearchIO, because nodes of elasticsearch use org.elasticsearch.client.transport to communicate and elasticsearchio use org.elasticsearch.client.restclient to communicate to elasticsearch nodes but in Solr, nodes also use solrj to communicate, therefore if we spin up a cluster of Solr 6 in SolrIOTest, the solrj version in classpath will be solrj 6. the only option we have left is testing different Solr's versions verions using docker (SolrITTest) therefore the test (SolrIO) will run on different JVM as Solr nodes (just like in production world) Note that: The rule of Lucene/Solr is any methods can be removed after 2 major version (ie: methods of Solr 5 can be removed in Solr 7). > Add support for Solr 6.x/7.x > > > Key: BEAM-3947 > URL: https://issues.apache.org/jira/browse/BEAM-3947 > Project: Beam > Issue Type: Improvement > Components: io-java-solr >Reporter: Ismaël Mejía >Assignee: Cao Manh Dat >Priority: Minor > > The initial PR on Solr was based on Solr 6.x, however at that time we > supported Java 7 so Insisted to move it to Solr 5.x (which is Java 7 > compatible). This issue is to add support for multiple versions of Solr > ideally in a single module. > Notice that I was able to recover the original code for Solr 6.x created by > [~caomanhdat] here (there are some differences in the way the Split was > calculated and maybe some other minor things):( > [https://github.com/iemejia/beam/blob/recover-solrio/sdks/java/io/solr/pom.xml] > This issue does not cover support for Solr 7, but if it is possible to add it > as part of it, it would be great. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (BEAM-3947) Add support for Solr 6.x/7.x
[ https://issues.apache.org/jira/browse/BEAM-3947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16456285#comment-16456285 ] Cao Manh Dat edited comment on BEAM-3947 at 4/27/18 12:45 PM: -- After taking a look at the current state, I think we must discuss the goal of this issue. If we just want the pipeline to be able to read from Solr, then the current code is fine, it can read/write data from/to Solr 5x, Solr 6x and Solr 7x. Because all the uses of SolrJ ( the client for Solr cluster ) in Beam right now are very basic, ex * Parsing data from Zookeeper to know where the Solr nodes live * Calling some HTTP APIs that has not changed since then It seems that we should focus on using new features of Solr 6x and Solr 7x ( we may or may not need to update the SolrJ ) * Support "/export" handler, it will make SolrIO significantly faster since all the documents are streamed in one response and the cost of retrieving document's fields are much less than current ( column-oriented vs row-oriented ) * BoundedSolrSource.split can split the source into arbitrary smaller parts. was (Author: caomanhdat): After taking a look at the current state, I think we must discuss the goal of this issue. If we just want the pipeline to be able to read from Solr, then the current code is fine, it can read data from Solr 5x, Solr 6x and Solr 7x. Because all the uses of SolrJ ( the client for Solr cluster ) in Beam right now are very basic, ex * Parsing data from Zookeeper to know where the Solr nodes live * Calling some HTTP APIs that has not changed since then It seems that we should focus on using new features of Solr 6x and Solr 7x ( we may or may not need to update the SolrJ ) * Support "/export" handler, it will make SolrIO significantly faster since all the documents are streamed in one response and the cost of retrieving document's fields are much less than current ( column-oriented vs row-oriented ) * BoundedSolrSource.split can split the source into arbitrary smaller parts. > Add support for Solr 6.x/7.x > > > Key: BEAM-3947 > URL: https://issues.apache.org/jira/browse/BEAM-3947 > Project: Beam > Issue Type: Improvement > Components: io-java-solr >Reporter: Ismaël Mejía >Assignee: Cao Manh Dat >Priority: Minor > > The initial PR on Solr was based on Solr 6.x, however at that time we > supported Java 7 so Insisted to move it to Solr 5.x (which is Java 7 > compatible). This issue is to add support for multiple versions of Solr > ideally in a single module. > Notice that I was able to recover the original code for Solr 6.x created by > [~caomanhdat] here (there are some differences in the way the Split was > calculated and maybe some other minor things):( > [https://github.com/iemejia/beam/blob/recover-solrio/sdks/java/io/solr/pom.xml] > This issue does not cover support for Solr 7, but if it is possible to add it > as part of it, it would be great. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-3947) Add support for Solr 6.x/7.x
[ https://issues.apache.org/jira/browse/BEAM-3947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16456285#comment-16456285 ] Cao Manh Dat commented on BEAM-3947: After taking a look at the current state, I think we must discuss the goal of this issue. If we just want the pipeline to be able to read from Solr, then the current code is fine, it can read data from Solr 5x, Solr 6x and Solr 7x. Because all the uses of SolrJ ( the client for Solr cluster ) in Beam right now are very basic, ex * Parsing data from Zookeeper to know where the Solr nodes live * Calling some HTTP APIs that has not changed since then It seems that we should focus on using new features of Solr 6x and Solr 7x ( we may or may not need to update the SolrJ ) * Support "/export" handler, it will make SolrIO significantly faster since all the documents are streamed in one response and the cost of retrieving document's fields are much less than current ( column-oriented vs row-oriented ) * BoundedSolrSource.split can split the source into arbitrary smaller parts. > Add support for Solr 6.x/7.x > > > Key: BEAM-3947 > URL: https://issues.apache.org/jira/browse/BEAM-3947 > Project: Beam > Issue Type: Improvement > Components: io-java-solr >Reporter: Ismaël Mejía >Assignee: Cao Manh Dat >Priority: Minor > > The initial PR on Solr was based on Solr 6.x, however at that time we > supported Java 7 so Insisted to move it to Solr 5.x (which is Java 7 > compatible). This issue is to add support for multiple versions of Solr > ideally in a single module. > Notice that I was able to recover the original code for Solr 6.x created by > [~caomanhdat] here (there are some differences in the way the Split was > calculated and maybe some other minor things):( > [https://github.com/iemejia/beam/blob/recover-solrio/sdks/java/io/solr/pom.xml] > This issue does not cover support for Solr 7, but if it is possible to add it > as part of it, it would be great. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (BEAM-3947) Add support for Solr 6.x/7.x
[ https://issues.apache.org/jira/browse/BEAM-3947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cao Manh Dat reassigned BEAM-3947: -- Assignee: Cao Manh Dat > Add support for Solr 6.x/7.x > > > Key: BEAM-3947 > URL: https://issues.apache.org/jira/browse/BEAM-3947 > Project: Beam > Issue Type: Improvement > Components: io-java-solr >Reporter: Ismaël Mejía >Assignee: Cao Manh Dat >Priority: Minor > > The initial PR on Solr was based on Solr 6.x, however at that time we > supported Java 7 so Insisted to move it to Solr 5.x (which is Java 7 > compatible). This issue is to add support for multiple versions of Solr > ideally in a single module. > Notice that I was able to recover the original code for Solr 6.x created by > [~caomanhdat] here (there are some differences in the way the Split was > calculated and maybe some other minor things):( > [https://github.com/iemejia/beam/blob/recover-solrio/sdks/java/io/solr/pom.xml] > This issue does not cover support for Solr 7, but if it is possible to add it > as part of it, it would be great. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-3947) Add support for Solr 6.x
[ https://issues.apache.org/jira/browse/BEAM-3947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16444028#comment-16444028 ] Cao Manh Dat commented on BEAM-3947: I think I will have some time to work on this issue, support Solr 7.x sounds good to me. But the requirement for Solr 7 is java 8, is that ok [~iemejia] ? > Add support for Solr 6.x > > > Key: BEAM-3947 > URL: https://issues.apache.org/jira/browse/BEAM-3947 > Project: Beam > Issue Type: Improvement > Components: io-java-solr >Reporter: Ismaël Mejía >Priority: Minor > > The initial PR on Solr was based on Solr 6.x, however at that time we > supported Java 7 so Insisted to move it to Solr 5.x (which is Java 7 > compatible). This issue is to add support for multiple versions of Solr > ideally in a single module. > Notice that I was able to recover the original code for Solr 6.x created by > [~caomanhdat] here (there are some differences in the way the Split was > calculated and maybe some other minor things):( > [https://github.com/iemejia/beam/blob/recover-solrio/sdks/java/io/solr/pom.xml] > This issue does not cover support for Solr 7, but if it is possible to add it > as part of it, it would be great. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (BEAM-3018) Remove duplicated methods in StructuredCoder
Cao Manh Dat created BEAM-3018: -- Summary: Remove duplicated methods in StructuredCoder Key: BEAM-3018 URL: https://issues.apache.org/jira/browse/BEAM-3018 Project: Beam Issue Type: Improvement Components: sdk-java-core Reporter: Cao Manh Dat Assignee: Cao Manh Dat StructuredCoder has several methods that are totally the same as its parent. We should remove these. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (BEAM-2657) Create Solr IO
[ https://issues.apache.org/jira/browse/BEAM-2657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16097319#comment-16097319 ] Cao Manh Dat commented on BEAM-2657: Thanks [~iemejia]! I should take a look at old issues first, sorry about that. BTW: My plan for upcoming features of SolrIO are - Support more parameters ( fl, sort, fq, defType, isExport ) for reading from Solr - In case of {{isExport = true}}, we will use ([exportHandler|https://cwiki.apache.org/confluence/display/solr/Exporting+Result+Sets]) which is much faster for streaming back result to Beam runners. - By using HashQParserPlugin we can support split volume into smaller parts. > Create Solr IO > -- > > Key: BEAM-2657 > URL: https://issues.apache.org/jira/browse/BEAM-2657 > Project: Beam > Issue Type: New Feature > Components: sdk-java-extensions >Reporter: Cao Manh Dat >Assignee: Davor Bonaci > > I'm working on a new SolrIO ( this components borrow som design's idea from > ElasticsearchIO ) providing both bounded source and sink. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (BEAM-2657) Create Solr IO
Cao Manh Dat created BEAM-2657: -- Summary: Create Solr IO Key: BEAM-2657 URL: https://issues.apache.org/jira/browse/BEAM-2657 Project: Beam Issue Type: New Feature Components: sdk-java-extensions Reporter: Cao Manh Dat Assignee: Davor Bonaci I'm working on a new SolrIO ( this components borrow som design's idea from ElasticsearchIO ) providing both bounded source and sink. -- This message was sent by Atlassian JIRA (v6.4.14#64029)