[jira] [Commented] (SOLR-14470) Add streaming expressions to /export handler
[ https://issues.apache.org/jira/browse/SOLR-14470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17210921#comment-17210921 ] ASF subversion and git services commented on SOLR-14470: Commit 5fec41e490430240bd2d0d9e54b5c857e82a9bf4 in lucene-solr's branch refs/heads/branch_8x from Andrzej Bialecki [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=5fec41e ] SOLR-14470: Fix test failures by reducing the randomness of test data. > Add streaming expressions to /export handler > > > Key: SOLR-14470 > URL: https://issues.apache.org/jira/browse/SOLR-14470 > Project: Solr > Issue Type: Improvement > Components: Export Writer, streaming expressions >Reporter: Andrzej Bialecki >Assignee: Andrzej Bialecki >Priority: Major > Fix For: 8.6 > > Time Spent: 1.5h > Remaining Estimate: 0h > > Many streaming scenarios would greatly benefit from the ability to perform > partial rollups (or other transformations) as early as possible, in order to > minimize the amount of data that has to be sent from shards to the > aggregating node. > This can be implemented as a subset of streaming expressions that process the > data directly inside each local {{ExportHandler}} and outputs only the > records from the resulting stream. > Conceptually it would be similar to the way Hadoop {{Combiner}} works. As is > the case with {{Combiner}}, because the input data is processed in batches > there would be no guarantee that only 1 record per unique sort values would > be emitted - in fact, in most cases multiple partial aggregations would be > emitted. Still, in many scenarios this would allow reducing the amount of > data to be sent by several orders of magnitude. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14470) Add streaming expressions to /export handler
[ https://issues.apache.org/jira/browse/SOLR-14470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17210022#comment-17210022 ] Andrzej Bialecki commented on SOLR-14470: - Huh, indeed! Thanks for the find and sorry for the hassle ... Please go ahead and backport it. > Add streaming expressions to /export handler > > > Key: SOLR-14470 > URL: https://issues.apache.org/jira/browse/SOLR-14470 > Project: Solr > Issue Type: Improvement > Components: Export Writer, streaming expressions >Reporter: Andrzej Bialecki >Assignee: Andrzej Bialecki >Priority: Major > Fix For: 8.6 > > Time Spent: 1.5h > Remaining Estimate: 0h > > Many streaming scenarios would greatly benefit from the ability to perform > partial rollups (or other transformations) as early as possible, in order to > minimize the amount of data that has to be sent from shards to the > aggregating node. > This can be implemented as a subset of streaming expressions that process the > data directly inside each local {{ExportHandler}} and outputs only the > records from the resulting stream. > Conceptually it would be similar to the way Hadoop {{Combiner}} works. As is > the case with {{Combiner}}, because the input data is processed in batches > there would be no guarantee that only 1 record per unique sort values would > be emitted - in fact, in most cases multiple partial aggregations would be > emitted. Still, in many scenarios this would allow reducing the amount of > data to be sent by several orders of magnitude. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14470) Add streaming expressions to /export handler
[ https://issues.apache.org/jira/browse/SOLR-14470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17209557#comment-17209557 ] Munendra S N commented on SOLR-14470: - [~ab] I was looking into an intermittent test failure in TestExportWriter(from badApple report) but it wasn't failing on master even with beasting but it failed on 8x branch. On comparing the diff, it seems the commit {{a5543dfb5112d12b9616f2d09cd07e9805b177de}} is not backported to 8x. I locally backported to 8x and did beasting(50 iterations) and there were no test failures. Just checking with you before backporting test fixes > Add streaming expressions to /export handler > > > Key: SOLR-14470 > URL: https://issues.apache.org/jira/browse/SOLR-14470 > Project: Solr > Issue Type: Improvement > Components: Export Writer, streaming expressions >Reporter: Andrzej Bialecki >Assignee: Andrzej Bialecki >Priority: Major > Fix For: 8.6 > > Time Spent: 1.5h > Remaining Estimate: 0h > > Many streaming scenarios would greatly benefit from the ability to perform > partial rollups (or other transformations) as early as possible, in order to > minimize the amount of data that has to be sent from shards to the > aggregating node. > This can be implemented as a subset of streaming expressions that process the > data directly inside each local {{ExportHandler}} and outputs only the > records from the resulting stream. > Conceptually it would be similar to the way Hadoop {{Combiner}} works. As is > the case with {{Combiner}}, because the input data is processed in batches > there would be no guarantee that only 1 record per unique sort values would > be emitted - in fact, in most cases multiple partial aggregations would be > emitted. Still, in many scenarios this would allow reducing the amount of > data to be sent by several orders of magnitude. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14470) Add streaming expressions to /export handler
[ https://issues.apache.org/jira/browse/SOLR-14470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17198512#comment-17198512 ] ASF subversion and git services commented on SOLR-14470: Commit 1160216bfba491218fe45a644f9fda8b557c5b91 in lucene-solr's branch refs/heads/reference_impl_dev from Adrien Grand [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=1160216 ] SOLR-14470: Fix precommit > Add streaming expressions to /export handler > > > Key: SOLR-14470 > URL: https://issues.apache.org/jira/browse/SOLR-14470 > Project: Solr > Issue Type: Improvement > Components: Export Writer, streaming expressions >Reporter: Andrzej Bialecki >Assignee: Andrzej Bialecki >Priority: Major > Fix For: 8.6 > > Time Spent: 1.5h > Remaining Estimate: 0h > > Many streaming scenarios would greatly benefit from the ability to perform > partial rollups (or other transformations) as early as possible, in order to > minimize the amount of data that has to be sent from shards to the > aggregating node. > This can be implemented as a subset of streaming expressions that process the > data directly inside each local {{ExportHandler}} and outputs only the > records from the resulting stream. > Conceptually it would be similar to the way Hadoop {{Combiner}} works. As is > the case with {{Combiner}}, because the input data is processed in batches > there would be no guarantee that only 1 record per unique sort values would > be emitted - in fact, in most cases multiple partial aggregations would be > emitted. Still, in many scenarios this would allow reducing the amount of > data to be sent by several orders of magnitude. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14470) Add streaming expressions to /export handler
[ https://issues.apache.org/jira/browse/SOLR-14470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17176282#comment-17176282 ] ASF subversion and git services commented on SOLR-14470: Commit a5543dfb5112d12b9616f2d09cd07e9805b177de in lucene-solr's branch refs/heads/master from Andrzej Bialecki [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=a5543df ] SOLR-14470: Fix test failures by reducing the randomness of test data. > Add streaming expressions to /export handler > > > Key: SOLR-14470 > URL: https://issues.apache.org/jira/browse/SOLR-14470 > Project: Solr > Issue Type: Improvement > Components: Export Writer, streaming expressions >Reporter: Andrzej Bialecki >Assignee: Andrzej Bialecki >Priority: Major > Fix For: 8.6 > > Time Spent: 1h 20m > Remaining Estimate: 0h > > Many streaming scenarios would greatly benefit from the ability to perform > partial rollups (or other transformations) as early as possible, in order to > minimize the amount of data that has to be sent from shards to the > aggregating node. > This can be implemented as a subset of streaming expressions that process the > data directly inside each local {{ExportHandler}} and outputs only the > records from the resulting stream. > Conceptually it would be similar to the way Hadoop {{Combiner}} works. As is > the case with {{Combiner}}, because the input data is processed in batches > there would be no guarantee that only 1 record per unique sort values would > be emitted - in fact, in most cases multiple partial aggregations would be > emitted. Still, in many scenarios this would allow reducing the amount of > data to be sent by several orders of magnitude. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14470) Add streaming expressions to /export handler
[ https://issues.apache.org/jira/browse/SOLR-14470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17128962#comment-17128962 ] ASF subversion and git services commented on SOLR-14470: Commit 684c2e6afea0229400eb929ff6a774e6a05fa9e8 in lucene-solr's branch refs/heads/master from Adrien Grand [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=684c2e6 ] SOLR-14470: Fix precommit > Add streaming expressions to /export handler > > > Key: SOLR-14470 > URL: https://issues.apache.org/jira/browse/SOLR-14470 > Project: Solr > Issue Type: Improvement > Components: Export Writer, streaming expressions >Reporter: Andrzej Bialecki >Assignee: Andrzej Bialecki >Priority: Major > Fix For: 8.6 > > Time Spent: 1h 20m > Remaining Estimate: 0h > > Many streaming scenarios would greatly benefit from the ability to perform > partial rollups (or other transformations) as early as possible, in order to > minimize the amount of data that has to be sent from shards to the > aggregating node. > This can be implemented as a subset of streaming expressions that process the > data directly inside each local {{ExportHandler}} and outputs only the > records from the resulting stream. > Conceptually it would be similar to the way Hadoop {{Combiner}} works. As is > the case with {{Combiner}}, because the input data is processed in batches > there would be no guarantee that only 1 record per unique sort values would > be emitted - in fact, in most cases multiple partial aggregations would be > emitted. Still, in many scenarios this would allow reducing the amount of > data to be sent by several orders of magnitude. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14470) Add streaming expressions to /export handler
[ https://issues.apache.org/jira/browse/SOLR-14470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17128963#comment-17128963 ] ASF subversion and git services commented on SOLR-14470: Commit 88cc68b5260bf1fdf6c5b89e391a19d40e45437e in lucene-solr's branch refs/heads/branch_8x from Adrien Grand [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=88cc68b ] SOLR-14470: Fix precommit > Add streaming expressions to /export handler > > > Key: SOLR-14470 > URL: https://issues.apache.org/jira/browse/SOLR-14470 > Project: Solr > Issue Type: Improvement > Components: Export Writer, streaming expressions >Reporter: Andrzej Bialecki >Assignee: Andrzej Bialecki >Priority: Major > Fix For: 8.6 > > Time Spent: 1h 20m > Remaining Estimate: 0h > > Many streaming scenarios would greatly benefit from the ability to perform > partial rollups (or other transformations) as early as possible, in order to > minimize the amount of data that has to be sent from shards to the > aggregating node. > This can be implemented as a subset of streaming expressions that process the > data directly inside each local {{ExportHandler}} and outputs only the > records from the resulting stream. > Conceptually it would be similar to the way Hadoop {{Combiner}} works. As is > the case with {{Combiner}}, because the input data is processed in batches > there would be no guarantee that only 1 record per unique sort values would > be emitted - in fact, in most cases multiple partial aggregations would be > emitted. Still, in many scenarios this would allow reducing the amount of > data to be sent by several orders of magnitude. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14470) Add streaming expressions to /export handler
[ https://issues.apache.org/jira/browse/SOLR-14470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17128347#comment-17128347 ] ASF subversion and git services commented on SOLR-14470: Commit 107f655a7f256f193ef81bd69658de33549ab0a3 in lucene-solr's branch refs/heads/branch_8x from Andrzej Bialecki [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=107f655 ] SOLR-14470: Add streaming expressions to /export handler. > Add streaming expressions to /export handler > > > Key: SOLR-14470 > URL: https://issues.apache.org/jira/browse/SOLR-14470 > Project: Solr > Issue Type: Improvement > Components: Export Writer, streaming expressions >Reporter: Andrzej Bialecki >Assignee: Andrzej Bialecki >Priority: Major > Time Spent: 1h 20m > Remaining Estimate: 0h > > Many streaming scenarios would greatly benefit from the ability to perform > partial rollups (or other transformations) as early as possible, in order to > minimize the amount of data that has to be sent from shards to the > aggregating node. > This can be implemented as a subset of streaming expressions that process the > data directly inside each local {{ExportHandler}} and outputs only the > records from the resulting stream. > Conceptually it would be similar to the way Hadoop {{Combiner}} works. As is > the case with {{Combiner}}, because the input data is processed in batches > there would be no guarantee that only 1 record per unique sort values would > be emitted - in fact, in most cases multiple partial aggregations would be > emitted. Still, in many scenarios this would allow reducing the amount of > data to be sent by several orders of magnitude. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14470) Add streaming expressions to /export handler
[ https://issues.apache.org/jira/browse/SOLR-14470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17128322#comment-17128322 ] ASF subversion and git services commented on SOLR-14470: Commit 30924f23d6834605b9bf2d24509755ff61c4e878 in lucene-solr's branch refs/heads/master from Andrzej Bialecki [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=30924f2 ] SOLR-14470: Add streaming expressions to /export handler. > Add streaming expressions to /export handler > > > Key: SOLR-14470 > URL: https://issues.apache.org/jira/browse/SOLR-14470 > Project: Solr > Issue Type: Improvement > Components: Export Writer, streaming expressions >Reporter: Andrzej Bialecki >Assignee: Andrzej Bialecki >Priority: Major > Time Spent: 1h 20m > Remaining Estimate: 0h > > Many streaming scenarios would greatly benefit from the ability to perform > partial rollups (or other transformations) as early as possible, in order to > minimize the amount of data that has to be sent from shards to the > aggregating node. > This can be implemented as a subset of streaming expressions that process the > data directly inside each local {{ExportHandler}} and outputs only the > records from the resulting stream. > Conceptually it would be similar to the way Hadoop {{Combiner}} works. As is > the case with {{Combiner}}, because the input data is processed in batches > there would be no guarantee that only 1 record per unique sort values would > be emitted - in fact, in most cases multiple partial aggregations would be > emitted. Still, in many scenarios this would allow reducing the amount of > data to be sent by several orders of magnitude. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14470) Add streaming expressions to /export handler
[ https://issues.apache.org/jira/browse/SOLR-14470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17114230#comment-17114230 ] David Smiley commented on SOLR-14470: - Ok; makes sense. > Add streaming expressions to /export handler > > > Key: SOLR-14470 > URL: https://issues.apache.org/jira/browse/SOLR-14470 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Components: Export Writer, streaming expressions >Reporter: Andrzej Bialecki >Assignee: Andrzej Bialecki >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > Many streaming scenarios would greatly benefit from the ability to perform > partial rollups (or other transformations) as early as possible, in order to > minimize the amount of data that has to be sent from shards to the > aggregating node. > This can be implemented as a subset of streaming expressions that process the > data directly inside each local {{ExportHandler}} and outputs only the > records from the resulting stream. > Conceptually it would be similar to the way Hadoop {{Combiner}} works. As is > the case with {{Combiner}}, because the input data is processed in batches > there would be no guarantee that only 1 record per unique sort values would > be emitted - in fact, in most cases multiple partial aggregations would be > emitted. Still, in many scenarios this would allow reducing the amount of > data to be sent by several orders of magnitude. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14470) Add streaming expressions to /export handler
[ https://issues.apache.org/jira/browse/SOLR-14470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17112099#comment-17112099 ] Andrzej Bialecki commented on SOLR-14470: - For some reason Jira didn't add a link to the PR: [https://github.com/apache/lucene-solr/pull/1506] The implementation simply reuses the streaming API to process documents just before they are sent out from /export, and it's purely optional - it's used only when {{expr=}} parameter is specified. I had to do some restructuring of {{ExportWriter}} so the diff may seem large, but that was also to increase the reuse of already existing methods - the actual changes to ExportWriter that matter are just 20-some lines that hook-up the special streaming shim (ExportWriterStream). > Add streaming expressions to /export handler > > > Key: SOLR-14470 > URL: https://issues.apache.org/jira/browse/SOLR-14470 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Components: Export Writer, streaming expressions >Reporter: Andrzej Bialecki >Assignee: Andrzej Bialecki >Priority: Major > > Many streaming scenarios would greatly benefit from the ability to perform > partial rollups (or other transformations) as early as possible, in order to > minimize the amount of data that has to be sent from shards to the > aggregating node. > This can be implemented as a subset of streaming expressions that process the > data directly inside each local {{ExportHandler}} and outputs only the > records from the resulting stream. > Conceptually it would be similar to the way Hadoop {{Combiner}} works. As is > the case with {{Combiner}}, because the input data is processed in batches > there would be no guarantee that only 1 record per unique sort values would > be emitted - in fact, in most cases multiple partial aggregations would be > emitted. Still, in many scenarios this would allow reducing the amount of > data to be sent by several orders of magnitude. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14470) Add streaming expressions to /export handler
[ https://issues.apache.org/jira/browse/SOLR-14470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17111458#comment-17111458 ] David Smiley commented on SOLR-14470: - Sounds great but hopefully can be done in a layered way. "/export" has a straight-forward purpose. Adding aggregations _directly_ to it concerns me; it's then not some straight-forward component. > Add streaming expressions to /export handler > > > Key: SOLR-14470 > URL: https://issues.apache.org/jira/browse/SOLR-14470 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Components: Export Writer, streaming expressions >Reporter: Andrzej Bialecki >Assignee: Andrzej Bialecki >Priority: Major > > Many streaming scenarios would greatly benefit from the ability to perform > partial rollups (or other transformations) as early as possible, in order to > minimize the amount of data that has to be sent from shards to the > aggregating node. > This can be implemented as a subset of streaming expressions that process the > data directly inside each local {{ExportHandler}} and outputs only the > records from the resulting stream. > Conceptually it would be similar to the way Hadoop {{Combiner}} works. As is > the case with {{Combiner}}, because the input data is processed in batches > there would be no guarantee that only 1 record per unique sort values would > be emitted - in fact, in most cases multiple partial aggregations would be > emitted. Still, in many scenarios this would allow reducing the amount of > data to be sent by several orders of magnitude. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org