[GitHub] spark issue #17039: [SPARK-19710][SQL][TESTS] Fix ordering of rows in query ...
Github user robbinspg commented on the issue: https://github.com/apache/spark/pull/17039 ok there were a couple of similar issues such as in-set-operations query 9, group-analytics.sql.out query 21 and 22 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17039: [SPARK-19710][SQL][TESTS] Fix ordering of rows in query ...
Github user robbinspg commented on the issue: https://github.com/apache/spark/pull/17039 ok so here is an example of output I'm not sure is correct: in-order-by -- !query 17 SELECT Count(DISTINCT( t1a )), t1b FROM t1 WHERE t1h NOT IN (SELECT t2h FROM t2 where t1a = t2a order by t2d DESC nulls first ) GROUP BY t1a, t1b ORDER BY t1b DESC nulls last -- !query 17 schema struct<count(DISTINCT t1a):bigint,t1b:smallint> -- !query 17 output 1 10 1 10 1 16 1 6 1 8 1 NULL That is the "new" output with your change but it doesn't actually match what you'd expect from that query (it isn't t1b DESC) which would be 1 16 1 10 1 10 1 8 1 6 1 NULL --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17039: [SPARK-19710][SQL][TESTS] Fix ordering of rows in query ...
Github user robbinspg commented on the issue: https://github.com/apache/spark/pull/17039 @hvanhovell So I backed out the changes in this PR, implemented your change to SQLQueryTestSuite.getNormalizedResult, regenerated the golden results files and the tests all pass on my x86 and big endian platforms. results files that were changed: sql/core/src/test/resources/sql-tests/results/group-analytics.sql.out sql/core/src/test/resources/sql-tests/results/order-by-nulls-ordering.sql.out sql/core/src/test/resources/sql-tests/results/subquery/in-subquery/in-joins.sql.out sql/core/src/test/resources/sql-tests/results/subquery/in-subquery/in-order-by.sql.out sql/core/src/test/resources/sql-tests/results/subquery/in-subquery/in-set-operations.sql.out sql/core/src/test/resources/sql-tests/results/subquery/in-subquery/not-in-joins.sql.out So, should I abandon this PR and go with your solution? I can submit your change in a PR along with updated results files if you want. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17039: [SPARK-19710][SQL][TESTS] Fix ordering of rows in query ...
Github user robbinspg commented on the issue: https://github.com/apache/spark/pull/17039 I think that the current "order if not currently ordered" in the test suite is good for checking the set of results for unordered queries. If ordered at all then the results should be deterministic given the input data and query are part of the test otherwise it is a bad test. So... I think this PR is the way to go. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17039: [SPARK-19710][SQL][TESTS] Fix ordering of rows in query ...
Github user robbinspg commented on the issue: https://github.com/apache/spark/pull/17039 Jenkins retest please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17039: [SPARK-19710][SQL][TESTS] Fix ordering of rows in query ...
Github user robbinspg commented on the issue: https://github.com/apache/spark/pull/17039 @gatorsmile I'm glad it wasn't just me that found it complex ;-) I've modified the patch to remove an unnecessary change as that query was not ordered and the test suite code handles that case. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17039: [SPARK-19710][SQL][TESTS] Fix ordering of rows in query ...
Github user robbinspg commented on the issue: https://github.com/apache/spark/pull/17039 @hvanhovell @gatorsmile I agree that would be a better solution however I don't know how to achieve that being unfamiliar with this code. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17039: [SPARK-19710] Fix ordering of rows in query resul...
GitHub user robbinspg opened a pull request: https://github.com/apache/spark/pull/17039 [SPARK-19710] Fix ordering of rows in query results ## What changes were proposed in this pull request? Changes to SQLQueryTests to make the order of the results constant. Where possible ORDER BY has been added to match the existing expected output ## How was this patch tested? Test runs on x86, zLinux (big endian), ppc (big endian) You can merge this pull request into a Git repository by running: $ git pull https://github.com/robbinspg/spark-1 SPARK-19710 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/17039.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #17039 commit fbc46a6f5ec2a4aaf3c6b4d5d776ccc7b114842f Author: Pete Robbins <robbin...@gmail.com> Date: 2016-12-21T10:06:41Z o.a.s.unsafe.types.UTF8StringSuite.writeToOutputStreamIntArray test fails on big endian. Only change byte order on little endian commit 30e20be2c199cc57a6a85547770dfa6fc3d32752 Author: Pete Robbins <robbin...@gmail.com> Date: 2016-12-22T09:14:14Z Simplify setting of byte order commit 145c76a2ce4b53726c209f04e0a230692b395369 Author: Pete Robbins <robbin...@gmail.com> Date: 2016-12-22T16:08:38Z Merge branch 'master' of https://github.com/apache/spark.git commit f0e77f29f1dca2198a87efa28fb01fd247162ceb Author: Pete Robbins <robbin...@gmail.com> Date: 2016-12-23T08:55:15Z Merge branch 'master' of https://github.com/apache/spark.git commit 1bc1adf48dae6b1047ff4d4e3d467ffec88abe12 Author: Pete Robbins <robbin...@gmail.com> Date: 2016-12-23T08:56:42Z remove redundant comment commit ea259fc7a00b3aab1dd554ec9850d057407b1875 Author: Pete Robbins <robbin...@gmail.com> Date: 2017-01-03T10:29:01Z Merge branch 'master' of https://github.com/apache/spark.git commit f4b76a779df4c9b952114a59907a27a2eddd9898 Author: Pete Robbins <robbin...@gmail.com> Date: 2017-02-03T13:21:41Z Merge branch 'master' of https://github.com/apache/spark.git commit b5571ea47408f2027a73368adf763b0f0d60eba0 Author: Pete Robbins <robbin...@gmail.com> Date: 2017-02-13T13:13:05Z Merge branch 'master' of https://github.com/apache/spark.git commit 191777387b4b76afd8442ddc9b33815bd487dfe6 Author: Pete Robbins <robbin...@gmail.com> Date: 2017-02-14T10:57:47Z Merge branch 'master' of https://github.com/apache/spark.git commit a832b740a5e29b25e8e31cd5de73a311f855a12d Author: Pete Robbins <robbin...@gmail.com> Date: 2017-02-20T13:18:05Z Merge branch 'master' of https://github.com/apache/spark.git commit bafe31ccbefdc80d434e34baae8600cb6ef26c56 Author: Pete Robbins <robbin...@gmail.com> Date: 2017-02-23T11:29:32Z Merge branch 'master' of https://github.com/apache/spark.git commit 950415f98c4574532da9dd089e5a9b027d3683d8 Author: Pete Robbins <robbin...@gmail.com> Date: 2017-02-23T11:33:53Z Update tests to produce reliably ordered results --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16841: [SPARK-18871][SQL][TESTS] New test cases for IN/NOT IN s...
Github user robbinspg commented on the issue: https://github.com/apache/spark/pull/16841 OK I'll raise a separate Jira, document the differences and submit a PR --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16841: [SPARK-18871][SQL][TESTS] New test cases for IN/NOT IN s...
Github user robbinspg commented on the issue: https://github.com/apache/spark/pull/16841 @kevinyu98 Several of the new tests fail on Big Endian platforms. It appears that rows are returned in a slightly different order but are still a correct output from the query. For example in-joins query 4: -- !query 4 SELECTCount(DISTINCT(t1a)), t1b, t3a, t3b, t3c FROM t1 natural left JOIN t3 WHERE t1a IN ( SELECT t2a FROM t2 WHERE t1d = t2d) AND t1b > t3b GROUP BY t1a, t1b, t3a, t3b, t3c ORDER BY t1a DESC on Little Endian returns 1 10 val3b 8 NULL 1 10 val1b 8 16 1 10 val3a 6 12 1 8 val3a 6 12 1 8 val3a 6 12 wheras on big endian returns: 1 10 val3a 6 12 1 10 val3b 8 NULL 1 10 val1b 8 16 1 8 val3a 6 12 1 8 val3a 6 12 I believe GROUP BY does not define any ordering so both of these outputs are valid for the query as the ORDER BY is only on t1a but obviously the big endian output does not match your expected output so fails. I'm trying to determine why the execution on big endian returns the rows in a different order. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16795: [SPARK-19409][BUILD] Fix ParquetAvroCompatibility...
Github user robbinspg commented on a diff in the pull request: https://github.com/apache/spark/pull/16795#discussion_r99622258 --- Diff: sql/core/pom.xml --- @@ -130,6 +130,12 @@ test + org.apache.avro --- End diff -- Is this issue only a test dependency? I see avro.version declared in the top level pom.xml --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16751: [SPARK-19409][BUILD] Bump parquet version to 1.8.2
Github user robbinspg commented on the issue: https://github.com/apache/spark/pull/16751 Sorry, I've been away for the w/end. Yes we use maven for our test runs. Looks like you have it under control. Thanks --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16751: [SPARK-19409][BUILD] Bump parquet version to 1.8.2
Github user robbinspg commented on the issue: https://github.com/apache/spark/pull/16751 Since this commit our test runs are failing with ParquetAvroCompatibilitySuite: *** RUN ABORTED *** java.lang.NoClassDefFoundError: org/apache/avro/LogicalType at org.apache.parquet.avro.AvroParquetWriter.writeSupport(AvroParquetWriter.java:144) at org.apache.parquet.avro.AvroParquetWriter.access$100(AvroParquetWriter.java:35) at org.apache.parquet.avro.AvroParquetWriter$Builder.getWriteSupport(AvroParquetWriter.java:173) Does the avro.version also need to be bumped to 1.8.x? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16375: [SPARK-18963] o.a.s.unsafe.types.UTF8StringSuite.writeTo...
Github user robbinspg commented on the issue: https://github.com/apache/spark/pull/16375 Test run is failing with an unrelated error --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16375: [SPARK-18963] o.a.s.unsafe.types.UTF8StringSuite.writeTo...
Github user robbinspg commented on the issue: https://github.com/apache/spark/pull/16375 Jenkins retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16375: [SPARK-18963] o.a.s.unsafe.types.UTF8StringSuite....
Github user robbinspg commented on a diff in the pull request: https://github.com/apache/spark/pull/16375#discussion_r93587770 --- Diff: common/unsafe/src/test/java/org/apache/spark/unsafe/types/UTF8StringSuite.java --- @@ -591,7 +591,11 @@ public void writeToOutputStreamIntArray() throws IOException { // verify that writes work on objects that are not byte arrays final ByteBuffer buffer = StandardCharsets.UTF_8.encode("大åä¸ç"); buffer.position(0); -buffer.order(ByteOrder.LITTLE_ENDIAN); + --- End diff -- Ok I'll change it to that --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16375: [SPRK-18963] o.a.s.unsafe.types.UTF8StringSuite.w...
GitHub user robbinspg opened a pull request: https://github.com/apache/spark/pull/16375 [SPRK-18963] o.a.s.unsafe.types.UTF8StringSuite.writeToOutputStreamIntArray test fails on big endian. Only change byte order on little endian ## What changes were proposed in this pull request? (Please fill in changes proposed in this fix) ## How was this patch tested? (Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests) (If this patch involves UI changes, please attach a screenshot; otherwise, remove this) Please review http://spark.apache.org/contributing.html before opening a pull request. You can merge this pull request into a Git repository by running: $ git pull https://github.com/robbinspg/spark-1 SPARK-18963 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/16375.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #16375 commit fbc46a6f5ec2a4aaf3c6b4d5d776ccc7b114842f Author: Pete Robbins <robbin...@gmail.com> Date: 2016-12-21T10:06:41Z o.a.s.unsafe.types.UTF8StringSuite.writeToOutputStreamIntArray test fails on big endian. Only change byte order on little endian --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15307: [SPARK-17731][SQL][STREAMING] Metrics for structured str...
Github user robbinspg commented on the issue: https://github.com/apache/spark/pull/15307 This PR seems to cause intermittent test failures eg: https://amplab.cs.berkeley.edu/jenkins/job/spark-master-test-sbt-hadoop-2.7/1736/testReport/junit/org.apache.spark.sql.streaming/StreamingQueryListenerSuite/single_listener__check_trigger_statuses/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15464: [SPARK-17827][SQL]maxColLength type should be Int for St...
Github user robbinspg commented on the issue: https://github.com/apache/spark/pull/15464 Tests all pass on big-endian with this PR --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15464: [SPARK-17827][SQL]maxColLength type should be Int for St...
Github user robbinspg commented on the issue: https://github.com/apache/spark/pull/15464 This PR contains a change to o.a.s.sql.hive.StatisticsSuite which I believe should fix that issue (awaiting big-endian build to complete) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15464: [SPARK-17827][SQL]maxColLength type should be Int...
GitHub user robbinspg opened a pull request: https://github.com/apache/spark/pull/15464 [SPARK-17827][SQL]maxColLength type should be Int for String and Binary ## What changes were proposed in this pull request? correct the expected type from Length function to be Int ## How was this patch tested? Test runs on little endian and big endian platforms You can merge this pull request into a Git repository by running: $ git pull https://github.com/robbinspg/spark-1 SPARK-17827 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/15464.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #15464 commit 559c3b905bb4a95e880051a7066438705bb1ecfd Author: Pete Robbins <robbin...@gmail.com> Date: 2016-10-13T12:58:40Z Max length returns an Int for String and binary --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13652: [SPARK-15613] [SQL] Fix incorrect days to millis convers...
Github user robbinspg commented on the issue: https://github.com/apache/spark/pull/13652 Also failing here in the UK: {noformat} - to UTC timestamp *** FAILED *** "2016-03-13 [02]:00:00.0" did not equal "2016-03-13 [10]:00:00.0" (DateTimeUtilsSuite.scala:506) {noformat} --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13707: [WIP][SPARK-15822][SQL] avoid UTF8String references into...
Github user robbinspg commented on the issue: https://github.com/apache/spark/pull/13707 So clearly that code doesn't work when the type is a primitive. I'm not familiar with the code generation. Is there a way to detect the type during generation rather than generating the dodgy "isinstanceof" code? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13707: [SPARK-15822][SQL] avoid UTF8String references into free...
Github user robbinspg commented on the issue: https://github.com/apache/spark/pull/13707 @davies @hvanhovell Can you take a look at this. I'm not sure it is the best fix. Also are there any other types (structs, arrays etc) that are created by pointing into an UnsafeRow that could equally be affected. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13707: [SPARK-15822][SQL] avoid UTF8String references in...
GitHub user robbinspg opened a pull request: https://github.com/apache/spark/pull/13707 [SPARK-15822][SQL] avoid UTF8String references into freed pages ## What changes were proposed in this pull request? In SMJ codegen we need to save copies of UTF8String values as the final iterator.next() will free the underlying memory page. ## How was this patch tested? Test application as described in 15822 now passes. You can merge this pull request into a Git repository by running: $ git pull https://github.com/robbinspg/spark-1 master Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/13707.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #13707 commit d201034e628456b9640eb8483da849217aa80c92 Author: Pete Robbins <robbin...@gmail.com> Date: 2016-06-16T14:33:31Z Create copy of UTF8String in SMJ commit 1288f1e88f90b751cd0640864d7cc6cb5a9dfeca Author: Pete Robbins <robbin...@gmail.com> Date: 2016-06-16T15:21:23Z make copy() public --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13589: [SPARK-15822][SPARK-15825][SQL] Fix SMJ Segfault/Invalid...
Github user robbinspg commented on the issue: https://github.com/apache/spark/pull/13589 As Adam says I still get the segv with OpenJDK on linux amd64 running our app. This fix does appear to fix the issue reported in https://issues.apache.org/jira/browse/SPARK-15825 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13355: [SPARK-15606][core] Use non-blocking removeExecutor call...
Github user robbinspg commented on the issue: https://github.com/apache/spark/pull/13355 @zsxwing ok to merge now? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13355: [SPARK-15606][core] Use non-blocking removeExecutor call...
Github user robbinspg commented on the issue: https://github.com/apache/spark/pull/13355 Test suite removed --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15606][core] Use a minimum of 3 dispatcher thread...
Github user robbinspg commented on a diff in the pull request: https://github.com/apache/spark/pull/13355#discussion_r65332472 --- Diff: core/src/main/scala/org/apache/spark/storage/BlockManagerMaster.scala --- @@ -38,7 +38,8 @@ class BlockManagerMaster( /** Remove a dead executor from the driver endpoint. This is only called on the driver side. */ def removeExecutor(execId: String) { -tell(RemoveExecutor(execId)) --- End diff -- OK so I've added a new removeExecutorAsync method to minimise side effects --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15606][core] Use a minimum of 3 dispatcher thread...
Github user robbinspg commented on the pull request: https://github.com/apache/spark/pull/13355 reverted original fix and replaced with using non-blocking call in BlockManagerMaster.removeExecutor. Also added a new test suite to run Distributed suite forcing the number of dispatcher threads to 2. This suite will fail without the fix. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15606][core] Use a minimum of 3 dispatcher thread...
Github user robbinspg commented on the pull request: https://github.com/apache/spark/pull/13355 OK, that's what I tried but it threw up some errors in some other tests which I'm investigating. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15606][core] Use a minimum of 3 dispatcher thread...
Github user robbinspg commented on the pull request: https://github.com/apache/spark/pull/13355 @zsxwing Do you mean change BlockManagerMaster.removeExecutor to send the message using send (fire and forget) rather than askWithRetry? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15606][core] Use a minimum of 3 dispatc...
Github user robbinspg commented on the pull request: https://github.com/apache/spark/pull/13355#issuecomment-09844 agreed. I'll take a look. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15606][core] Use a minimum of 3 dispatc...
Github user robbinspg commented on the pull request: https://github.com/apache/spark/pull/13355#issuecomment-222104521 Although this patch resolves this particular issue I would echo the comment in https://github.com/apache/spark/pull/11728 by @zsxwing {quote} However, the root cause is there are blocking calls in the event loops but not enough threads. This could happen in other places (such as netty, akka). Ideally, we should avoid blocking calls in all event loops. However, it's hard to figure out all of them in the huge code bases :( {quote} --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Use a minimum of 3 dispatcher threads to avoid...
GitHub user robbinspg opened a pull request: https://github.com/apache/spark/pull/13355 Use a minimum of 3 dispatcher threads to avoid deadlocks ## What changes were proposed in this pull request? Set minimum number of dispatcher threads to 3 to avoid deadlocks on machines with only 2 cores ## How was this patch tested? Spark test builds You can merge this pull request into a Git repository by running: $ git pull https://github.com/robbinspg/spark-1 SPARK-13906 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/13355.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #13355 commit 139e87558d728c5ae4ccf297c1702a73d5573335 Author: Pete Robbins <robbin...@gmail.com> Date: 2016-05-27T09:32:49Z Use a minimum of 3 dispatcher threads to avoid deadlocks --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Spark-15154][SQL] Change key types to Long in...
Github user robbinspg commented on the pull request: https://github.com/apache/spark/pull/13009#issuecomment-218105338 should I add an assert into the LongHashedRelation.apply to validate the key and a test to cover this? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Spark-15154][SQL] Change key types to Long in...
GitHub user robbinspg opened a pull request: https://github.com/apache/spark/pull/13009 [Spark-15154][SQL] Change key types to Long in tests ## What changes were proposed in this pull request? As reported in the Jira the 2 tests changed here are using a key of type Integer where the Spark sql code assumes the type is Long. This PR changes the tests to use the correct key types. ## How was this patch tested? Test builds run on both Big Endian and Little Endian platforms (Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests) (If this patch involves UI changes, please attach a screenshot; otherwise, remove this) You can merge this pull request into a Git repository by running: $ git pull https://github.com/robbinspg/spark-1 HashedRelationSuiteFix Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/13009.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #13009 commit 96a0dff9e08727d6885f2c5b8c30ec1281714ce6 Author: Pete Robbins <robbin...@gmail.com> Date: 2016-05-05T12:19:31Z Change key types to Long in tests --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13745][SQL]Support columnar in memory r...
Github user robbinspg commented on the pull request: https://github.com/apache/spark/pull/12397#issuecomment-216360442 Many thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13745][SQL]Support columnar in memory r...
Github user robbinspg commented on the pull request: https://github.com/apache/spark/pull/12397#issuecomment-215690156 Sorry to keep bugging you on this but I'd really like to fix this major issue and move on. If there are no objections to merging this into master could a committer please do the honours. I don't want to have to create and maintain a Big Endian fork if possible. Cheers --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13745][SQL]Support columnar in memory r...
Github user robbinspg commented on the pull request: https://github.com/apache/spark/pull/12397#issuecomment-214981491 @rxin @hvanhovell is there anything preventing this being merged? IMHO the jira it is fixing is a blocking defect --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13745][SQL]Support columnar in memory r...
Github user robbinspg commented on the pull request: https://github.com/apache/spark/pull/12397#issuecomment-214629761 @hvanhovell Spark 1.6.1 is fine on BE. The issues have been with new function added for Spark 2.0. This PR fixes the major issue. There are a few other issues which need investigating which may be flaky tests but this is the most important. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13745][SQL]Support columnar in memory r...
Github user robbinspg commented on the pull request: https://github.com/apache/spark/pull/12397#issuecomment-214429928 @hvanhovell Can me merge this now? I agree the benchmarks should run after a steady state is achieved. Also I'll probably create a change to allow the Benchmark to output in csv format! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13745][SQL]Support columnar in memory r...
Github user robbinspg commented on the pull request: https://github.com/apache/spark/pull/12397#issuecomment-213670252 Can we re test this as I think there was a minor change since the test build --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13745][SQL]Support columnar in memory r...
Github user robbinspg commented on the pull request: https://github.com/apache/spark/pull/12501#issuecomment-213670198 closing this in favour of other implementation --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13745][SQL]Support columnar in memory r...
Github user robbinspg closed the pull request at: https://github.com/apache/spark/pull/12501 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13745][SQL]Support columnar in memory r...
Github user robbinspg commented on the pull request: https://github.com/apache/spark/pull/12397#issuecomment-213482568 @rxin Do you think we can merge in this PR? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14848][SQL] Compare as Set in DatasetSu...
GitHub user robbinspg opened a pull request: https://github.com/apache/spark/pull/12610 [SPARK-14848][SQL] Compare as Set in DatasetSuite - Java encoder ## What changes were proposed in this pull request? Change test to compare sets rather than sequence ## How was this patch tested? Full test runs on little endian and big endian platforms You can merge this pull request into a Git repository by running: $ git pull https://github.com/robbinspg/spark-1 DatasetSuiteFix Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/12610.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #12610 commit 9203f72155aa5fe4e7ebba158591d6961035371a Author: Pete Robbins <robbin...@gmail.com> Date: 2016-04-22T12:19:15Z Compare as Set in DatasetSuite - Java encoder --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13745][SQL]Support columnar in memory r...
Github user robbinspg commented on the pull request: https://github.com/apache/spark/pull/12397#issuecomment-212930732 @hvanhovell Here are the test results running 10x the size: [ParquetReadBenchmarks.txt](https://github.com/apache/spark/files/230027/ParquetReadBenchmarks.txt) I'm not sure there is a lot in it and if it were up to me I'd go with the implementation in this PR rather than the subclassing. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13745][SQL]Support columnar in memory r...
Github user robbinspg commented on the pull request: https://github.com/apache/spark/pull/12397#issuecomment-212789051 @hvanhovell Any thoughts/interpretations on those benchmark results? I think the differences are all within the bounds of randomness! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13745][SQL]Support columnar in memory r...
Github user robbinspg commented on the pull request: https://github.com/apache/spark/pull/12397#issuecomment-212404982 [ParquetReadBenchmark-PartitionedTable.txt](https://github.com/apache/spark/files/227908/ParquetReadBenchmark-PartitionedTable.txt) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13745][SQL]Support columnar in memory r...
Github user robbinspg commented on the pull request: https://github.com/apache/spark/pull/12397#issuecomment-212388759 Averaged results for 5 runs for first 3 benchmarks: [ParquetReadBenchmark.txt](https://github.com/apache/spark/files/227827/ParquetReadBenchmark.txt) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13745][SQL]Support columnar in memory r...
Github user robbinspg commented on the pull request: https://github.com/apache/spark/pull/12397#issuecomment-212358250 @hvanhovell Yes I will. I'm trying to get a stable base benchmark first as running the ParquetReadBenchmark repeatedly against the base code (before either PR) I get what looks like a 10% variation in results. The same is true with either of the PRs applied so I will average out the runs. As far as subclassing goes I would expect performance on LE to remain the same as the code path should be identical once the classes are instantiated. Performance on BE between the 2 approaches is another thing but not the major concern at the moment as we are going from failing/exceptions thrown to "working" --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13745][SQL]Support columnar in memory r...
Github user robbinspg commented on the pull request: https://github.com/apache/spark/pull/12397#issuecomment-212024789 Alternative implementation in https://github.com/apache/spark/pull/12501 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13745][SQL]Support columnar in memory r...
GitHub user robbinspg opened a pull request: https://github.com/apache/spark/pull/12501 [SPARK-13745][SQL]Support columnar in memory representation on Big Endian platforms - implent by subclassing ## What changes were proposed in this pull request? An alternative implementation of https://github.com/apache/spark/pull/12397 which uses subclasses to minimize any potential performance hits on Little Endian parquet datasource and ColumnarBatch tests fail on big-endian platforms This patch adds support for the little-endian byte arrays being correctly interpreted on a big-endian platform ## How was this patch tested? Spark test builds ran on big endian z/Linux and regression build on little endian amd64 You can merge this pull request into a Git repository by running: $ git pull https://github.com/robbinspg/spark-1 bigEndianViaSubclass Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/12501.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #12501 commit 1fc048385fb0fea93eef85f614586448a3ea7c2a Author: Pete Robbins <robbin...@gmail.com> Date: 2016-04-14T13:50:34Z Support columnar in memory representation on Big Endian platforms commit 3eb481d8c30639c5b9a219e4891ccaccf73075b0 Author: Pete Robbins <robbin...@gmail.com> Date: 2016-04-14T19:24:48Z Use ByteBuffer.wrap instead of allocate commit 69fc667266c5efe97f796c6b4e8d14470168867d Author: Pete Robbins <robbin...@gmail.com> Date: 2016-04-15T10:10:09Z Fix offsets commit a1f06106d321ca40bef6dcd7865484fd79976b08 Author: Pete Robbins <robbin...@gmail.com> Date: 2016-04-15T11:55:21Z Wrap byte array once commit a652865e9f59ca4cf4fc596141ae0511284462b4 Author: Pete Robbins <robbin...@gmail.com> Date: 2016-04-15T12:06:37Z remove trailing spaces commit 804740c9dbe3c4bbe8145cc119c997531634ebb1 Author: Pete Robbins <robbin...@gmail.com> Date: 2016-04-18T15:05:48Z Merge branch 'master' of https://github.com/apache/spark.git into apache-master commit f109bda995a70be8787fd10f414a5be2125d97b2 Author: Pete Robbins <robbin...@gmail.com> Date: 2016-04-19T09:04:41Z Merge branch 'master' of https://github.com/apache/spark.git into apache-master commit d7cbc84e1dfdae1036345956ea8f210c5c982b3d Author: Pete Robbins <robbin...@gmail.com> Date: 2016-04-19T14:13:27Z Big endian implementation using subclassing commit 648b7ac9fa0f7466e05151d5f07a1e613a682bfb Author: Pete Robbins <robbin...@gmail.com> Date: 2016-04-19T14:32:19Z missing else clause --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13745][SQL]Support columnar in memory r...
Github user robbinspg commented on the pull request: https://github.com/apache/spark/pull/12397#issuecomment-211952868 So I have ran the ParquetReadBenchmark several times before this PR and after. I'm not sure how to interpret the results though as there is quite a variation in results on the same code base. I've also implemented the fix via subclassing which should not affect the Liltte Endian code path at all and will benchmark that and get back with the results. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13745][SQL]Support columnar in memory r...
Github user robbinspg commented on the pull request: https://github.com/apache/spark/pull/12397#issuecomment-211565728 will do --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13745][SQL]Support columnar in memory r...
Github user robbinspg commented on the pull request: https://github.com/apache/spark/pull/12397#issuecomment-211563273 @rxin so what do we need to do to get this into 2.0.0? Although the JIRA is of type "improvement" I could argue that it is a blocking defect as Spark has supported Big Endian platforms up to now. I'm happy to re-write the patch as loading separate subclassed implementations of the On/OffHeapColumnVector and VectorizedPlainValuesReader but that is a far more complex fix and given the code paths I'd doubt any measurable performance change. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13745][SQL]Support columnar in memory r...
Github user robbinspg commented on the pull request: https://github.com/apache/spark/pull/12397#issuecomment-211247119 I haven't ran any explicit performance tests for this. Do we have any specific to this area? Using the static final boolean is allowing the jit to eliminate the dead code path. I discussed an alternative implementation with @nongli in https://github.com/apache/spark/pull/10628#issuecomment-205993243 and the if(...) implementation was deemed ok. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP][SPARK-13745][SQL]Support columnar in mem...
Github user robbinspg commented on the pull request: https://github.com/apache/spark/pull/12397#issuecomment-210549840 @nongli So I've changed the patch to wrap the buffer in initFromPage but only for Big Endian. I'd like to see this patch get in to 2.0.0 so Big Endian platforms are not broken. We can discuss refactoring the code to always use a ByteBuffer rather than byte[] later? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13745][SQL]Support columnar in memory r...
Github user robbinspg commented on a diff in the pull request: https://github.com/apache/spark/pull/12397#discussion_r59863075 --- Diff: sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/VectorizedPlainValuesReader.java --- @@ -31,6 +33,8 @@ private byte[] buffer; private int offset; private int bitOffset; // Only used for booleans. + + private final static boolean bigEndianPlatform = ByteOrder.nativeOrder().equals(ByteOrder.BIG_ENDIAN); --- End diff -- So the first step is to get this functional on big endian without affecting the little endian implementation asap for inclusion in 2.0.0. In VectorizedPlainValuesReader we could wrap the buffer once (assuming it never gets re-allocated) and use the ByteBuffer for the big endian access. We could always use the ByteBuffer rather than the byte[] even in little endian implementation but I do not know what performance impact that would have and as stated above I did not want to mess around with the little endian implementation at this time. Of course the methods in on/offHeapColumnVector that take the byte[] as a parameter could also be changed to take the byte buffer instead. There is also the unnecessary subtracting/adding the Platform.BYTE_ARRAY_OFFSET around a lot of the method calls. eg public final void readIntegers(int total, ColumnVector c, int rowId) { c.putIntsLittleEndian(rowId, total, buffer, offset - Platform.BYTE_ARRAY_OFFSET); offset += 4 * total; } where the first thing that putIntsLittleEndian does is to add that back on to the offset. So I think for now I'm reasonably happy with this big endian support in this patch and we could maybe review the whole code structure later. I've improved the patch and will maybe make the change to VectorizedPlainValuesReader so the code in there only wraps the buffer once. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13745][SQL]Support columnar in memory r...
Github user robbinspg commented on a diff in the pull request: https://github.com/apache/spark/pull/12397#discussion_r59770275 --- Diff: sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/VectorizedPlainValuesReader.java --- @@ -31,6 +33,8 @@ private byte[] buffer; private int offset; private int bitOffset; // Only used for booleans. + + private final static boolean bigEndianPlatform = ByteOrder.nativeOrder().equals(ByteOrder.BIG_ENDIAN); --- End diff -- In which method? Do you mean: ByteBuffer.allocate(8).putDouble(v).order(ByteOrder.LITTLE_ENDIAN).getDouble(0); vs ByteBuffer.wrap(buffer).order(ByteOrder.LITTLE_ENDIAN).getDouble(offset); to save the allocate? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13745][SQL]Support columnar in memory r...
Github user robbinspg commented on the pull request: https://github.com/apache/spark/pull/12397#issuecomment-209971909 @nongli please can you review this --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13745][SQL]Support columnar in memory r...
GitHub user robbinspg opened a pull request: https://github.com/apache/spark/pull/12397 [SPARK-13745][SQL]Support columnar in memory representation on Big Endian platforms ## What changes were proposed in this pull request? parquet datasource and ColumnarBatch tests fail on big-endian platforms This patch adds support for the little-endian byte arrays being correctly interpreted on a big-endian platform ## How was this patch tested? Spark test builds ran on big endian z/Linux and regression build on little endian amd64 You can merge this pull request into a Git repository by running: $ git pull https://github.com/robbinspg/spark-1 master Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/12397.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #12397 commit 1fc048385fb0fea93eef85f614586448a3ea7c2a Author: Pete Robbins <robbin...@gmail.com> Date: 2016-04-14T13:50:34Z Support columnar in memory representation on Big Endian platforms --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12785][SQL] Add ColumnarBatch, an in me...
Github user robbinspg commented on the pull request: https://github.com/apache/spark/pull/10628#issuecomment-209832195 @nongli I'm just about there with a solution for Big Endian platforms and will be using https://issues.apache.org/jira/browse/SPARK-14151 for the changes. I have one question: It is clear from the tests using Parquet that the byte array passed into putIntsLittleEndian is in little endian order. It is also the case that the byte array passed in to the putFloats and putDoubles has the values in little endian. Reversing the floats/doubles enables all the tests to pass. In OffHeapColumnVector putDoubles(int rowId, int count, byte[] src, int srcIndex) if I assume input is LE then the org.apache.spark.sql.execution.vectorized.ColumnarBatchSuite.Double APIs test fails. This is because it is passing in a byte array of doubles that are in platform endian order (created with Platform.putDouble). My question is: are the byte arrays always in little endian? This seems to be true for the Parquet sources?? If so then I can modify the testcase 'org.apache.spark.sql.execution.vectorized.ColumnarBatchSuite.Double APIs' to force the test data into LE. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12785][SQL] Add ColumnarBatch, an in me...
Github user robbinspg commented on the pull request: https://github.com/apache/spark/pull/10628#issuecomment-206732654 We are actually seeing this issue in the OnHeap code as the byte array passed in to putIntLittleEndian is in little endian and the code is trying to read that with Platform.getInt, which on big endian platform will return the wrong value as it assumes the bytes are in big endian --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12785][SQL] Add ColumnarBatch, an in me...
Github user robbinspg commented on the pull request: https://github.com/apache/spark/pull/10628#issuecomment-204684533 So big endian implementations of OffHeapColumnVector and OnHeapColumnVector are needed. I don't think we'd want to have an inline 'if (bigEndian)' in the relevent methods so it may be that we'd want to subclass those classes, override the methods which require handling big endian, and instantiate the big endian version in ColumnVector.allocate if on a BE platform? xxxHeapColumnVector classes would have to lose the 'final' attribute. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12470] [SQL] Fix size reduction calcula...
Github user robbinspg commented on the pull request: https://github.com/apache/spark/pull/10421#issuecomment-168963270 I have a fix for the test failure. Should I create a new Jira and PR? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12647][SQL] Fix o.a.s.sqlexecution.Exch...
GitHub user robbinspg opened a pull request: https://github.com/apache/spark/pull/10599 [SPARK-12647][SQL] Fix o.a.s.sqlexecution.ExchangeCoordinatorSuite.determining the number of reducers: aggregate operator change expected partition sizes You can merge this pull request into a Git repository by running: $ git pull https://github.com/robbinspg/spark-1 branch-1.6 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/10599.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #10599 commit 841eed9acb4f2af3a8c63f80594f0d29e5a05669 Author: Pete Robbins <robbin...@gmail.com> Date: 2016-01-05T10:07:57Z Update expected partition size --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12470] [SQL] Fix size reduction calcula...
Github user robbinspg commented on the pull request: https://github.com/apache/spark/pull/10421#issuecomment-168973101 created https://issues.apache.org/jira/browse/SPARK-12647 and associated PR --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12647][SQL] Fix o.a.s.sqlexecution.Exch...
Github user robbinspg closed the pull request at: https://github.com/apache/spark/pull/10599 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12647][SQL] Fix o.a.s.sqlexecution.Exch...
Github user robbinspg commented on the pull request: https://github.com/apache/spark/pull/10599#issuecomment-169148536 I closed this as per request but it states "Closed with unmerged commits" --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12470] [SQL] Fix size reduction calcula...
Github user robbinspg commented on the pull request: https://github.com/apache/spark/pull/10421#issuecomment-168919276 Merging this into the 1.6 stream has caused a test failure in org.apache.spark.sql.execution.ExchangeCoordinatorSuite.determining the number of reducers: aggregate operator There was a change in master in the ExchangeCoordinatorSuite which set the expected partition sizes to a new value. I do not understand why the change in this PR affects the input partition sizes but it does. I think this is a test issue rather than an issue with this PR. Should I raise a new Jira to fix the expected partition sizes? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12470] [SQL] Fix size reduction calcula...
Github user robbinspg commented on the pull request: https://github.com/apache/spark/pull/10421#issuecomment-168172679 re-merged with latest master Please retest --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12470] [SQL] Fix size reduction calcula...
Github user robbinspg commented on the pull request: https://github.com/apache/spark/pull/10421#issuecomment-168091202 Fixed scala style check Please retest --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12470] [SQL] Fix size reduction calcula...
Github user robbinspg commented on a diff in the pull request: https://github.com/apache/spark/pull/10421#discussion_r48599743 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/GenerateUnsafeRowJoiner.scala --- @@ -171,7 +171,7 @@ object GenerateUnsafeRowJoiner extends CodeGenerator[(StructType, StructType), U |// row1: ${schema1.size} fields, $bitset1Words words in bitset |// row2: ${schema2.size}, $bitset2Words words in bitset |// output: ${schema1.size + schema2.size} fields, $outputBitsetWords words in bitset - |final int sizeInBytes = row1.getSizeInBytes() + row2.getSizeInBytes(); + |final int sizeInBytes = row1.getSizeInBytes() + row2.getSizeInBytes() - ($sizeReduction * 8); --- End diff -- OK. sorted. Can this be run by Jenkins tests? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12470] [SQL] Fix size reduction calcula...
Github user robbinspg commented on the pull request: https://github.com/apache/spark/pull/10421#issuecomment-167738123 @rxin as the original author of this code could you please review the PR? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12470] Fix size reduction calculation
Github user robbinspg commented on the pull request: https://github.com/apache/spark/pull/10421#issuecomment-166585412 I believe this is uncovering a test failure in ExchangeCoordinatorSuite so this please hold this PR until I investigate further. - determining the number of reducers: aggregate operator *** FAILED *** 3 did not equal 2 (ExchangeCoordinatorSuite.scala:316) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12470] Fix size reduction calculation
GitHub user robbinspg opened a pull request: https://github.com/apache/spark/pull/10421 [SPARK-12470] Fix size reduction calculation also only allocate required buffer size You can merge this pull request into a Git repository by running: $ git pull https://github.com/robbinspg/spark-1 master Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/10421.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #10421 commit 440cc51076d0f35cb30e8c7a6caf0d4607ba78bd Author: Pete Robbins <robbin...@gmail.com> Date: 2015-12-21T22:21:21Z Fix size reduction calculation --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9710] [test] Fix RPackageUtilsSuite whe...
Github user robbinspg commented on the pull request: https://github.com/apache/spark/pull/8008#issuecomment-142583322 My 1.5 branch build is failing with as described in SPARK-9710 and I notice that this merge didn't make it into that branch. Any chance this will be backported? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10454][Spark Core] wait for empty event...
GitHub user robbinspg opened a pull request: https://github.com/apache/spark/pull/8605 [SPARK-10454][Spark Core] wait for empty event queue You can merge this pull request into a Git repository by running: $ git pull https://github.com/robbinspg/spark-1 DAGSchedulerSuite-fix Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/8605.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #8605 commit 24438b3f6a870162c79cded14716ee5828bdf9a6 Author: robbins <robb...@uk.ibm.com> Date: 2015-09-04T20:28:29Z wait for empty event queue --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9869][Streaming] Wait for all event not...
GitHub user robbinspg opened a pull request: https://github.com/apache/spark/pull/8589 [SPARK-9869][Streaming] Wait for all event notifications before asserting results You can merge this pull request into a Git repository by running: $ git pull https://github.com/robbinspg/spark-1 InputStreamSuite-fix Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/8589.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #8589 commit f1df02b6834bed8dd32fc1382617c278a777cac4 Author: robbins <robb...@uk.ibm.com> Date: 2015-09-03T17:15:56Z Wait for all event notifications before asserting results --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10431][ Spark Core ] Fix intermittent t...
GitHub user robbinspg opened a pull request: https://github.com/apache/spark/pull/8582 [SPARK-10431][ Spark Core ] Fix intermittent test failure. Wait for event queue to be clear You can merge this pull request into a Git repository by running: $ git pull https://github.com/robbinspg/spark-1 InputOutputMetricsSuite Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/8582.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #8582 commit 3f8f21d1a42a8b8fbd87d134821c0afd5cb4216a Author: robbins <robb...@uk.ibm.com> Date: 2015-09-02T15:31:34Z Fix intermittent test failure. Wait for event queue to be clear --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10431][ Spark Core ] Fix intermittent t...
Github user robbinspg commented on the pull request: https://github.com/apache/spark/pull/8582#issuecomment-137432486 I see the test failure is https://issues.apache.org/jira/browse/SPARK-9869 which I'm sure is not related to this pull request. Ironically, looking at SPARK-9869 it looks like it could be a very similar issue to this fix, ie waiting for the listenerBus is empty before the asserts! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org