[jira] [Commented] (BEAM-3947) Add support for Solr 6.x/7.x

2018-05-09 Thread Cao Manh Dat (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-3947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16468539#comment-16468539
 ] 

Cao Manh Dat commented on BEAM-3947:


{quote}What do you think about the idea of structuring the {{SolrIO}} so there 
is one built specifically for Solr7 that we can then focus on bringing in new 
features / performance gains from now on?
{quote}
I like this idea as well as the idea of scope {{solr-solrj}} as provided. What 
about creating a new one that support only Solr 6 and 7?

> Add support for Solr 6.x/7.x
> 
>
> Key: BEAM-3947
> URL: https://issues.apache.org/jira/browse/BEAM-3947
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-solr
>Reporter: Ismaël Mejía
>Assignee: Cao Manh Dat
>Priority: Minor
>
> The initial PR on Solr was based on Solr 6.x, however at that time we 
> supported Java 7 so Insisted to move it to Solr 5.x (which is Java 7 
> compatible). This issue is to add support for multiple versions of Solr 
> ideally in a single module.
> Notice that I was able to recover the original code for Solr 6.x created by 
> [~caomanhdat] here (there are some differences in the way the Split was 
> calculated and maybe some other minor things):(
> [https://github.com/iemejia/beam/blob/recover-solrio/sdks/java/io/solr/pom.xml]
> This issue does not cover support for Solr 7, but if it is possible to add it 
> as part of it, it would be great.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-3947) Add support for Solr 6.x/7.x

2018-05-09 Thread Cao Manh Dat (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-3947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16468518#comment-16468518
 ] 

Cao Manh Dat commented on BEAM-3947:


In :
 * solr-solrj-5.x : {{method1}} is used to create a new client
 * solr-solrj-6.x: a new method is introduced, {{method2}} is used to create a 
new client, {{method1}} is deprecated.
 * solr-solrj-7.x: {{method1}} is removed.

I kinda afraid that if scope {{solr-solrj}} as provided when users provide a 
solr-solrj-7.x they will meet {{NoSuchMethodError}}, (because {{method1}} is 
not available).

 

> Add support for Solr 6.x/7.x
> 
>
> Key: BEAM-3947
> URL: https://issues.apache.org/jira/browse/BEAM-3947
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-solr
>Reporter: Ismaël Mejía
>Assignee: Cao Manh Dat
>Priority: Minor
>
> The initial PR on Solr was based on Solr 6.x, however at that time we 
> supported Java 7 so Insisted to move it to Solr 5.x (which is Java 7 
> compatible). This issue is to add support for multiple versions of Solr 
> ideally in a single module.
> Notice that I was able to recover the original code for Solr 6.x created by 
> [~caomanhdat] here (there are some differences in the way the Split was 
> calculated and maybe some other minor things):(
> [https://github.com/iemejia/beam/blob/recover-solrio/sdks/java/io/solr/pom.xml]
> This issue does not cover support for Solr 7, but if it is possible to add it 
> as part of it, it would be great.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (BEAM-3947) Add support for Solr 6.x/7.x

2018-05-08 Thread Cao Manh Dat (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-3947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16467496#comment-16467496
 ] 

Cao Manh Dat edited comment on BEAM-3947 at 5/8/18 2:46 PM:


{quote}
Cao Manh Dat I am probably overlooking something simple but doesn't 
elasticsearch-tests have a separate module per version which only really differ 
in the test scope dependencies?
https://github.com/apache/beam/tree/master/sdks/java/io/elasticsearch-tests
{quote}
I think we will meet NoSuchMethodError if we use this approach to test for Solr 
7.x.

{quote}Isn't the problem not the tests but that we (may) need different builds 
of the SolrIO to support API changes across versions of the SolrJ dependency? I 
gather that is a viable option for IO design when necessary.
{quote}
The point here is we can use the current SolrIO (which use Solrj 5.x) to 
fetching/writing data to a Solr cluster run on 5.x, 6.x, and 7.x. So I don't 
think we need to update the SolrJ dependency. 


was (Author: caomanhdat):
{quote}Isn't the problem not the tests but that we (may) need different builds 
of the SolrIO to support API changes across versions of the SolrJ dependency? I 
gather that is a viable option for IO design when necessary.
{quote}
The point here is we can use the current SolrIO (which use Solrj 5.x) to 
fetching/writing data to a Solr cluster run on 5.x, 6.x, and 7.x. So I don't 
think we need to update the SolrJ dependency. 

> Add support for Solr 6.x/7.x
> 
>
> Key: BEAM-3947
> URL: https://issues.apache.org/jira/browse/BEAM-3947
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-solr
>Reporter: Ismaël Mejía
>Assignee: Cao Manh Dat
>Priority: Minor
>
> The initial PR on Solr was based on Solr 6.x, however at that time we 
> supported Java 7 so Insisted to move it to Solr 5.x (which is Java 7 
> compatible). This issue is to add support for multiple versions of Solr 
> ideally in a single module.
> Notice that I was able to recover the original code for Solr 6.x created by 
> [~caomanhdat] here (there are some differences in the way the Split was 
> calculated and maybe some other minor things):(
> [https://github.com/iemejia/beam/blob/recover-solrio/sdks/java/io/solr/pom.xml]
> This issue does not cover support for Solr 7, but if it is possible to add it 
> as part of it, it would be great.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-3947) Add support for Solr 6.x/7.x

2018-05-08 Thread Cao Manh Dat (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-3947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16467496#comment-16467496
 ] 

Cao Manh Dat commented on BEAM-3947:


{quote}Isn't the problem not the tests but that we (may) need different builds 
of the SolrIO to support API changes across versions of the SolrJ dependency? I 
gather that is a viable option for IO design when necessary.
{quote}
The point here is we can use the current SolrIO (which use Solrj 5.x) to 
fetching/writing data to a Solr cluster run on 5.x, 6.x, and 7.x. So I don't 
think we need to update the SolrJ dependency. 

> Add support for Solr 6.x/7.x
> 
>
> Key: BEAM-3947
> URL: https://issues.apache.org/jira/browse/BEAM-3947
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-solr
>Reporter: Ismaël Mejía
>Assignee: Cao Manh Dat
>Priority: Minor
>
> The initial PR on Solr was based on Solr 6.x, however at that time we 
> supported Java 7 so Insisted to move it to Solr 5.x (which is Java 7 
> compatible). This issue is to add support for multiple versions of Solr 
> ideally in a single module.
> Notice that I was able to recover the original code for Solr 6.x created by 
> [~caomanhdat] here (there are some differences in the way the Split was 
> calculated and maybe some other minor things):(
> [https://github.com/iemejia/beam/blob/recover-solrio/sdks/java/io/solr/pom.xml]
> This issue does not cover support for Solr 7, but if it is possible to add it 
> as part of it, it would be great.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-3947) Add support for Solr 6.x/7.x

2018-05-08 Thread Cao Manh Dat (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-3947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16467427#comment-16467427
 ] 

Cao Manh Dat commented on BEAM-3947:


After putting a lot of thought on this issue. I think the best strategy is Beam 
should support as many version of Solr as possible (5.x, 6.x 7.x). But the 
problems arise on how we can test it? In SolrIOTest.java we can only test one 
version of Solr (currently Solr 5) because the cluster started on the test is 
in the same JVM as the SolrIO.

We can not use the same strategy as ElasticsearchIO, because nodes of 
elasticsearch use org.elasticsearch.client.transport to communicate and 
elasticsearchio use org.elasticsearch.client.restclient to communicate to 
elasticsearch nodes
but in Solr, nodes also use solrj to communicate, therefore if we spin up a 
cluster of Solr 6 in SolrIOTest, the solrj version in classpath will be solrj 6.

the only option we have left is testing different Solr's versions verions using 
docker (SolrITTest)
therefore the test (SolrIO) will run on different JVM as Solr nodes (just like 
in production world)

Note that: The rule of Lucene/Solr is any methods can be removed after 2 major 
version (ie: methods of Solr 5 can be removed in Solr 7).

> Add support for Solr 6.x/7.x
> 
>
> Key: BEAM-3947
> URL: https://issues.apache.org/jira/browse/BEAM-3947
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-solr
>Reporter: Ismaël Mejía
>Assignee: Cao Manh Dat
>Priority: Minor
>
> The initial PR on Solr was based on Solr 6.x, however at that time we 
> supported Java 7 so Insisted to move it to Solr 5.x (which is Java 7 
> compatible). This issue is to add support for multiple versions of Solr 
> ideally in a single module.
> Notice that I was able to recover the original code for Solr 6.x created by 
> [~caomanhdat] here (there are some differences in the way the Split was 
> calculated and maybe some other minor things):(
> [https://github.com/iemejia/beam/blob/recover-solrio/sdks/java/io/solr/pom.xml]
> This issue does not cover support for Solr 7, but if it is possible to add it 
> as part of it, it would be great.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (BEAM-3947) Add support for Solr 6.x/7.x

2018-04-27 Thread Cao Manh Dat (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-3947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16456285#comment-16456285
 ] 

Cao Manh Dat edited comment on BEAM-3947 at 4/27/18 12:45 PM:
--

After taking a look at the current state, I think we must discuss the goal of 
this issue.

If we just want the pipeline to be able to read from Solr, then the current 
code is fine, it can read/write data from/to Solr 5x, Solr 6x and Solr 7x. 
Because all the uses of SolrJ ( the client for Solr cluster ) in Beam right now 
are very basic, ex
 * Parsing data from Zookeeper to know where the Solr nodes live
 * Calling some HTTP APIs that has not changed since then

It seems that we should focus on using new features of Solr 6x and Solr 7x ( we 
may or may not need to update the SolrJ )
 * Support "/export" handler, it will make SolrIO significantly faster since 
all the documents are streamed in one response and the cost of retrieving 
document's fields are much less than current ( column-oriented vs row-oriented )
 * BoundedSolrSource.split can split the source into arbitrary smaller parts.

 

 


was (Author: caomanhdat):
After taking a look at the current state, I think we must discuss the goal of 
this issue.

If we just want the pipeline to be able to read from Solr, then the current 
code is fine, it can read data from Solr 5x, Solr 6x and Solr 7x. Because all 
the uses of SolrJ ( the client for Solr cluster ) in Beam right now are very 
basic, ex
 * Parsing data from Zookeeper to know where the Solr nodes live
 * Calling some HTTP APIs that has not changed since then

It seems that we should focus on using new features of Solr 6x and Solr 7x ( we 
may or may not need to update the SolrJ )
 * Support "/export" handler, it will make SolrIO significantly faster since 
all the documents are streamed in one response and the cost of retrieving 
document's fields are much less than current ( column-oriented vs row-oriented )
 * BoundedSolrSource.split can split the source into arbitrary smaller parts.

 

 

> Add support for Solr 6.x/7.x
> 
>
> Key: BEAM-3947
> URL: https://issues.apache.org/jira/browse/BEAM-3947
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-solr
>Reporter: Ismaël Mejía
>Assignee: Cao Manh Dat
>Priority: Minor
>
> The initial PR on Solr was based on Solr 6.x, however at that time we 
> supported Java 7 so Insisted to move it to Solr 5.x (which is Java 7 
> compatible). This issue is to add support for multiple versions of Solr 
> ideally in a single module.
> Notice that I was able to recover the original code for Solr 6.x created by 
> [~caomanhdat] here (there are some differences in the way the Split was 
> calculated and maybe some other minor things):(
> [https://github.com/iemejia/beam/blob/recover-solrio/sdks/java/io/solr/pom.xml]
> This issue does not cover support for Solr 7, but if it is possible to add it 
> as part of it, it would be great.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-3947) Add support for Solr 6.x/7.x

2018-04-27 Thread Cao Manh Dat (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-3947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16456285#comment-16456285
 ] 

Cao Manh Dat commented on BEAM-3947:


After taking a look at the current state, I think we must discuss the goal of 
this issue.

If we just want the pipeline to be able to read from Solr, then the current 
code is fine, it can read data from Solr 5x, Solr 6x and Solr 7x. Because all 
the uses of SolrJ ( the client for Solr cluster ) in Beam right now are very 
basic, ex
 * Parsing data from Zookeeper to know where the Solr nodes live
 * Calling some HTTP APIs that has not changed since then

It seems that we should focus on using new features of Solr 6x and Solr 7x ( we 
may or may not need to update the SolrJ )
 * Support "/export" handler, it will make SolrIO significantly faster since 
all the documents are streamed in one response and the cost of retrieving 
document's fields are much less than current ( column-oriented vs row-oriented )
 * BoundedSolrSource.split can split the source into arbitrary smaller parts.

 

 

> Add support for Solr 6.x/7.x
> 
>
> Key: BEAM-3947
> URL: https://issues.apache.org/jira/browse/BEAM-3947
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-solr
>Reporter: Ismaël Mejía
>Assignee: Cao Manh Dat
>Priority: Minor
>
> The initial PR on Solr was based on Solr 6.x, however at that time we 
> supported Java 7 so Insisted to move it to Solr 5.x (which is Java 7 
> compatible). This issue is to add support for multiple versions of Solr 
> ideally in a single module.
> Notice that I was able to recover the original code for Solr 6.x created by 
> [~caomanhdat] here (there are some differences in the way the Split was 
> calculated and maybe some other minor things):(
> [https://github.com/iemejia/beam/blob/recover-solrio/sdks/java/io/solr/pom.xml]
> This issue does not cover support for Solr 7, but if it is possible to add it 
> as part of it, it would be great.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (BEAM-3947) Add support for Solr 6.x/7.x

2018-04-25 Thread Cao Manh Dat (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cao Manh Dat reassigned BEAM-3947:
--

Assignee: Cao Manh Dat

> Add support for Solr 6.x/7.x
> 
>
> Key: BEAM-3947
> URL: https://issues.apache.org/jira/browse/BEAM-3947
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-solr
>Reporter: Ismaël Mejía
>Assignee: Cao Manh Dat
>Priority: Minor
>
> The initial PR on Solr was based on Solr 6.x, however at that time we 
> supported Java 7 so Insisted to move it to Solr 5.x (which is Java 7 
> compatible). This issue is to add support for multiple versions of Solr 
> ideally in a single module.
> Notice that I was able to recover the original code for Solr 6.x created by 
> [~caomanhdat] here (there are some differences in the way the Split was 
> calculated and maybe some other minor things):(
> [https://github.com/iemejia/beam/blob/recover-solrio/sdks/java/io/solr/pom.xml]
> This issue does not cover support for Solr 7, but if it is possible to add it 
> as part of it, it would be great.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-3947) Add support for Solr 6.x

2018-04-19 Thread Cao Manh Dat (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-3947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16444028#comment-16444028
 ] 

Cao Manh Dat commented on BEAM-3947:


I think I will have some time to work on this issue, support Solr 7.x sounds 
good to me. But the requirement for Solr 7 is java 8, is that ok [~iemejia] ?

> Add support for Solr 6.x
> 
>
> Key: BEAM-3947
> URL: https://issues.apache.org/jira/browse/BEAM-3947
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-solr
>Reporter: Ismaël Mejía
>Priority: Minor
>
> The initial PR on Solr was based on Solr 6.x, however at that time we 
> supported Java 7 so Insisted to move it to Solr 5.x (which is Java 7 
> compatible). This issue is to add support for multiple versions of Solr 
> ideally in a single module.
> Notice that I was able to recover the original code for Solr 6.x created by 
> [~caomanhdat] here (there are some differences in the way the Split was 
> calculated and maybe some other minor things):(
> [https://github.com/iemejia/beam/blob/recover-solrio/sdks/java/io/solr/pom.xml]
> This issue does not cover support for Solr 7, but if it is possible to add it 
> as part of it, it would be great.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (BEAM-3018) Remove duplicated methods in StructuredCoder

2017-10-04 Thread Cao Manh Dat (JIRA)
Cao Manh Dat created BEAM-3018:
--

 Summary: Remove duplicated methods in StructuredCoder
 Key: BEAM-3018
 URL: https://issues.apache.org/jira/browse/BEAM-3018
 Project: Beam
  Issue Type: Improvement
  Components: sdk-java-core
Reporter: Cao Manh Dat
Assignee: Cao Manh Dat


StructuredCoder has several methods that are totally the same as its parent. We 
should remove these.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (BEAM-2657) Create Solr IO

2017-07-22 Thread Cao Manh Dat (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-2657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16097319#comment-16097319
 ] 

Cao Manh Dat commented on BEAM-2657:


Thanks [~iemejia]! I should take a look at old issues first, sorry about that. 

BTW: My plan for upcoming features of SolrIO are
- Support more parameters ( fl, sort, fq, defType, isExport ) for reading from 
Solr
- In case of {{isExport = true}}, we will use 
([exportHandler|https://cwiki.apache.org/confluence/display/solr/Exporting+Result+Sets])
 which is much faster for streaming back result to Beam runners.
- By using HashQParserPlugin we can support split volume into smaller parts. 

> Create Solr IO
> --
>
> Key: BEAM-2657
> URL: https://issues.apache.org/jira/browse/BEAM-2657
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-extensions
>Reporter: Cao Manh Dat
>Assignee: Davor Bonaci
>
> I'm working on a new SolrIO ( this components borrow som design's idea from 
> ElasticsearchIO ) providing both bounded source and sink.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (BEAM-2657) Create Solr IO

2017-07-21 Thread Cao Manh Dat (JIRA)
Cao Manh Dat created BEAM-2657:
--

 Summary: Create Solr IO
 Key: BEAM-2657
 URL: https://issues.apache.org/jira/browse/BEAM-2657
 Project: Beam
  Issue Type: New Feature
  Components: sdk-java-extensions
Reporter: Cao Manh Dat
Assignee: Davor Bonaci


I'm working on a new SolrIO ( this components borrow som design's idea from 
ElasticsearchIO ) providing both bounded source and sink.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)