[jira] [Commented] (SPARK-12319) Address endian specific problems surfaced in 1.6
[ https://issues.apache.org/jira/browse/SPARK-12319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15073801#comment-15073801 ] Tim Preece commented on SPARK-12319: Michael, Since this JIRA's description is not quite right and involves two distinct problems, I have created a new JIRA https://issues.apache.org/jira/browse/SPARK-12555 to address the DatasetAggregatorSuite failure. This is important to us since it causes an explicit build failure on our Big Endian platforms. > Address endian specific problems surfaced in 1.6 > > > Key: SPARK-12319 > URL: https://issues.apache.org/jira/browse/SPARK-12319 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.6.0 > Environment: Problems apparent on BE, LE could be impacted too >Reporter: Adam Roberts >Priority: Critical > > JIRA to cover endian specific problems - since testing 1.6 I've noticed > problems with DataFrames on BE platforms, e.g. > https://issues.apache.org/jira/browse/SPARK-9858 > [~joshrosen] [~yhuai] > Current progress: using com.google.common.io.LittleEndianDataInputStream and > com.google.common.io.LittleEndianDataOutputStream within UnsafeRowSerializer > fixes three test failures in ExchangeCoordinatorSuite but I'm concerned > around performance/wider functional implications > "org.apache.spark.sql.DatasetAggregatorSuite.typed aggregation: class input > with reordering" fails as we expect "one, 1" but instead get "one, 9" - we > believe the issue lies within BitSetMethods.java, specifically around: return > (wi << 6) + subIndex + java.lang.Long.numberOfTrailingZeros(word); -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12319) Address endian specific problems surfaced in 1.6
[ https://issues.apache.org/jira/browse/SPARK-12319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15073821#comment-15073821 ] Tim Preece commented on SPARK-12319: The remaining problem is ExchangeCoordinatorSuite. I don't have the right access to update the description or title. > Address endian specific problems surfaced in 1.6 > > > Key: SPARK-12319 > URL: https://issues.apache.org/jira/browse/SPARK-12319 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.6.0 > Environment: Problems apparent on BE, LE could be impacted too >Reporter: Adam Roberts >Priority: Critical > > JIRA to cover endian specific problems - since testing 1.6 I've noticed > problems with DataFrames on BE platforms, e.g. > https://issues.apache.org/jira/browse/SPARK-9858 > [~joshrosen] [~yhuai] > Current progress: using com.google.common.io.LittleEndianDataInputStream and > com.google.common.io.LittleEndianDataOutputStream within UnsafeRowSerializer > fixes three test failures in ExchangeCoordinatorSuite but I'm concerned > around performance/wider functional implications > "org.apache.spark.sql.DatasetAggregatorSuite.typed aggregation: class input > with reordering" fails as we expect "one, 1" but instead get "one, 9" - we > believe the issue lies within BitSetMethods.java, specifically around: return > (wi << 6) + subIndex + java.lang.Long.numberOfTrailingZeros(word); -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12319) Address endian specific problems surfaced in 1.6
[ https://issues.apache.org/jira/browse/SPARK-12319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15073813#comment-15073813 ] Sean Owen commented on SPARK-12319: --- [~preece] what's the remaining problem here then? you can edit the description and title to reflect it. > Address endian specific problems surfaced in 1.6 > > > Key: SPARK-12319 > URL: https://issues.apache.org/jira/browse/SPARK-12319 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.6.0 > Environment: Problems apparent on BE, LE could be impacted too >Reporter: Adam Roberts >Priority: Critical > > JIRA to cover endian specific problems - since testing 1.6 I've noticed > problems with DataFrames on BE platforms, e.g. > https://issues.apache.org/jira/browse/SPARK-9858 > [~joshrosen] [~yhuai] > Current progress: using com.google.common.io.LittleEndianDataInputStream and > com.google.common.io.LittleEndianDataOutputStream within UnsafeRowSerializer > fixes three test failures in ExchangeCoordinatorSuite but I'm concerned > around performance/wider functional implications > "org.apache.spark.sql.DatasetAggregatorSuite.typed aggregation: class input > with reordering" fails as we expect "one, 1" but instead get "one, 9" - we > believe the issue lies within BitSetMethods.java, specifically around: return > (wi << 6) + subIndex + java.lang.Long.numberOfTrailingZeros(word); -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12319) Address endian specific problems surfaced in 1.6
[ https://issues.apache.org/jira/browse/SPARK-12319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15073145#comment-15073145 ] Tim Preece commented on SPARK-12319: Hi, The failing test is already checked in. It is: "org.apache.spark.sql.DatasetAggregatorSuite.typed aggregation: class input with reordering" The test only explicitly fails on Big Endian platforms. This is because an integer takes an 8 byte slot in the Unsafe row. When the data corruption occurs the BE integer ends up with the wrong value. I added print statements which shows the data corruption on Little Endian as well, it just happens not to effect the value of the LE integer, since the LE integer is in the other 4-bytes of the 8-byte slot. > Address endian specific problems surfaced in 1.6 > > > Key: SPARK-12319 > URL: https://issues.apache.org/jira/browse/SPARK-12319 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.6.0 > Environment: Problems apparent on BE, LE could be impacted too >Reporter: Adam Roberts >Priority: Critical > > JIRA to cover endian specific problems - since testing 1.6 I've noticed > problems with DataFrames on BE platforms, e.g. > https://issues.apache.org/jira/browse/SPARK-9858 > [~joshrosen] [~yhuai] > Current progress: using com.google.common.io.LittleEndianDataInputStream and > com.google.common.io.LittleEndianDataOutputStream within UnsafeRowSerializer > fixes three test failures in ExchangeCoordinatorSuite but I'm concerned > around performance/wider functional implications > "org.apache.spark.sql.DatasetAggregatorSuite.typed aggregation: class input > with reordering" fails as we expect "one, 1" but instead get "one, 9" - we > believe the issue lies within BitSetMethods.java, specifically around: return > (wi << 6) + subIndex + java.lang.Long.numberOfTrailingZeros(word); -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12319) Address endian specific problems surfaced in 1.6
[ https://issues.apache.org/jira/browse/SPARK-12319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15073099#comment-15073099 ] Michael Armbrust commented on SPARK-12319: -- Do you want to open a PR with your failing test case? > Address endian specific problems surfaced in 1.6 > > > Key: SPARK-12319 > URL: https://issues.apache.org/jira/browse/SPARK-12319 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.6.0 > Environment: Problems apparent on BE, LE could be impacted too >Reporter: Adam Roberts >Priority: Critical > > JIRA to cover endian specific problems - since testing 1.6 I've noticed > problems with DataFrames on BE platforms, e.g. > https://issues.apache.org/jira/browse/SPARK-9858 > [~joshrosen] [~yhuai] > Current progress: using com.google.common.io.LittleEndianDataInputStream and > com.google.common.io.LittleEndianDataOutputStream within UnsafeRowSerializer > fixes three test failures in ExchangeCoordinatorSuite but I'm concerned > around performance/wider functional implications > "org.apache.spark.sql.DatasetAggregatorSuite.typed aggregation: class input > with reordering" fails as we expect "one, 1" but instead get "one, 9" - we > believe the issue lies within BitSetMethods.java, specifically around: return > (wi << 6) + subIndex + java.lang.Long.numberOfTrailingZeros(word); -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12319) Address endian specific problems surfaced in 1.6
[ https://issues.apache.org/jira/browse/SPARK-12319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15068176#comment-15068176 ] Tim Preece commented on SPARK-12319: [~marmbrus] Hi Michael, I think this may be a problem with the new DataSet API, in particular the new "as" function of DataFrame which I see is tagged as Experimental. When we run the DatasetAggregatorSuite test "typed aggregation: class input with reordering" the implementation seems to get confused between the ordering of the data in the unsaferow (string,int) and the schema (int,string). This results in a testcase failure that shows up to BE platforms ( although the data is also corrupted on LE platforms ). At the moment I'm not sure how to fix, so any pointers would be helpful. > Address endian specific problems surfaced in 1.6 > > > Key: SPARK-12319 > URL: https://issues.apache.org/jira/browse/SPARK-12319 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.6.0 > Environment: Problems apparent on BE, LE could be impacted too >Reporter: Adam Roberts >Priority: Critical > > JIRA to cover endian specific problems - since testing 1.6 I've noticed > problems with DataFrames on BE platforms, e.g. > https://issues.apache.org/jira/browse/SPARK-9858 > [~joshrosen] [~yhuai] > Current progress: using com.google.common.io.LittleEndianDataInputStream and > com.google.common.io.LittleEndianDataOutputStream within UnsafeRowSerializer > fixes three test failures in ExchangeCoordinatorSuite but I'm concerned > around performance/wider functional implications > "org.apache.spark.sql.DatasetAggregatorSuite.typed aggregation: class input > with reordering" fails as we expect "one, 1" but instead get "one, 9" - we > believe the issue lies within BitSetMethods.java, specifically around: return > (wi << 6) + subIndex + java.lang.Long.numberOfTrailingZeros(word); -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12319) Address endian specific problems surfaced in 1.6
[ https://issues.apache.org/jira/browse/SPARK-12319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15066628#comment-15066628 ] Tim Preece commented on SPARK-12319: I notice for the failing testcase the schema ( for row1) mismatches the actual data in row1. Row1 has schema: SpecificUnsafeRowJoiner schema1 StructType(StructField(a,IntegerType,false), StructField(b,StringType,true)) But row 1 has the following data ( i.e. a string followed by int ) row1 [0,180003,1,656e6f] So why doesn't the schema mismatch the data? The name of the failing test may give a clue! test("typed aggregation: class input with reordering") { val ds = sql("SELECT 'one' AS b, 1 as a").as[AggData] checkAnswer( ds.select(ClassInputAgg.toColumn), 1) checkAnswer( ds.select(expr("avg(a)").as[Double], ClassInputAgg.toColumn), (1.0, 1)) checkAnswer( ds.groupBy(_.b).agg(ClassInputAgg.toColumn), ("one", 1)) } > Address endian specific problems surfaced in 1.6 > > > Key: SPARK-12319 > URL: https://issues.apache.org/jira/browse/SPARK-12319 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.6.0 > Environment: Problems apparent on BE, LE could be impacted too >Reporter: Adam Roberts >Priority: Critical > > JIRA to cover endian specific problems - since testing 1.6 I've noticed > problems with DataFrames on BE platforms, e.g. > https://issues.apache.org/jira/browse/SPARK-9858 > [~joshrosen] [~yhuai] > Current progress: using com.google.common.io.LittleEndianDataInputStream and > com.google.common.io.LittleEndianDataOutputStream within UnsafeRowSerializer > fixes three test failures in ExchangeCoordinatorSuite but I'm concerned > around performance/wider functional implications > "org.apache.spark.sql.DatasetAggregatorSuite.typed aggregation: class input > with reordering" fails as we expect "one, 1" but instead get "one, 9" - we > believe the issue lies within BitSetMethods.java, specifically around: return > (wi << 6) + subIndex + java.lang.Long.numberOfTrailingZeros(word); -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12319) Address endian specific problems surfaced in 1.6
[ https://issues.apache.org/jira/browse/SPARK-12319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15062175#comment-15062175 ] Tim Preece commented on SPARK-12319: Hi Sean, Yin I've started to ( and continue to ) investigate this DatasetAggregatorSuite failure as described above. So far I believe: a) the description is incorrect and it has nothing to do with endianess or BitSetMethods.java. (It just happens we see a failure on bigendian platforms - see below) b) the problem is probably in the codegen for unsaferow joins ( GenerateUnsafeRowJoiner ). I see two Unsaferows being joined. A (string,int) + (string) which results in an Unsaferow with schema (string,int,string). When we come to update the offsets for the variable length data ( in this case for the first String ) the offset is miscalculated. ( in updateOffset in GenerateUnsafeRowJoiner ) This means the int value in the second field slot is wrongly changed, and on a BE platform (for this particular testcase) it is incremented by 8. On a LE platform the value in the second field is also changed, but in a way that does not alter the value of the int. However for both BE and LE platforms the first String variable looks bogus with an invalid variable offset. I'm continuing to investigate ( and so could well revise the above ), but thought I would share my observations so far. Also it would be useful if you happened to have a pointer to any design documentation for unsaferow. For example I wasn't sure if all the variable length data should go at the end of the row. That is the schema for the joined row should actually have been (int,string,string). Tim Preece > Address endian specific problems surfaced in 1.6 > > > Key: SPARK-12319 > URL: https://issues.apache.org/jira/browse/SPARK-12319 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.6.0 > Environment: BE platforms >Reporter: Adam Roberts >Priority: Critical > > JIRA to cover endian specific problems - since testing 1.6 I've noticed > problems with DataFrames on BE platforms, e.g. > https://issues.apache.org/jira/browse/SPARK-9858 > [~joshrosen] [~yhuai] > Current progress: using com.google.common.io.LittleEndianDataInputStream and > com.google.common.io.LittleEndianDataOutputStream within UnsafeRowSerializer > fixes three test failures in ExchangeCoordinatorSuite but I'm concerned > around performance/wider functional implications > "org.apache.spark.sql.DatasetAggregatorSuite.typed aggregation: class input > with reordering" fails as we expect "one, 1" but instead get "one, 9" - we > believe the issue lies within BitSetMethods.java, specifically around: return > (wi << 6) + subIndex + java.lang.Long.numberOfTrailingZeros(word); -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12319) Address endian specific problems surfaced in 1.6
[ https://issues.apache.org/jira/browse/SPARK-12319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15055985#comment-15055985 ] Sean Owen commented on SPARK-12319: --- Do you have any more detail here -- what specifically is the test failure and fix? You're referring to bit twiddling ops in BitSetMethods, but these operators don't have an endian-ness. > Address endian specific problems surfaced in 1.6 > > > Key: SPARK-12319 > URL: https://issues.apache.org/jira/browse/SPARK-12319 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.6.0 > Environment: BE platforms >Reporter: Adam Roberts >Priority: Critical > > JIRA to cover endian specific problems - since testing 1.6 I've noticed > problems with DataFrames on BE platforms, e.g. > https://issues.apache.org/jira/browse/SPARK-9858 > [~joshrosen] [~yhuai] > Current progress: using com.google.common.io.LittleEndianDataInputStream and > com.google.common.io.LittleEndianDataOutputStream within UnsafeRowSerializer > fixes three test failures in ExchangeCoordinatorSuite but I'm concerned > around performance/wider functional implications > "org.apache.spark.sql.DatasetAggregatorSuite.typed aggregation: class input > with reordering" fails as we expect "one, 1" but instead get "one, 9" - we > believe the issue lies within BitSetMethods.java, specifically around: return > (wi << 6) + subIndex + java.lang.Long.numberOfTrailingZeros(word); -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12319) Address endian specific problems surfaced in 1.6
[ https://issues.apache.org/jira/browse/SPARK-12319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15056040#comment-15056040 ] Adam Roberts commented on SPARK-12319: -- Hi Sean, here are the failures ExchangeCoordinatorSuite: - test estimatePartitionStartIndices - 1 Exchange - test estimatePartitionStartIndices - 2 Exchanges - test estimatePartitionStartIndices and enforce minimal number of reducers - determining the number of reducers: aggregate operator(minNumPostShufflePartitions: 3) - determining the number of reducers: join operator(minNumPostShufflePartitions: 3) - determining the number of reducers: complex query 1(minNumPostShufflePartitions: 3) - determining the number of reducers: complex query 2(minNumPostShufflePartitions: 3) - determining the number of reducers: aggregate operator *** FAILED *** 3 did not equal 2 (ExchangeCoordinatorSuite.scala:315) - determining the number of reducers: join operator *** FAILED *** 1 did not equal 2 (ExchangeCoordinatorSuite.scala:366) - determining the number of reducers: complex query 1 - determining the number of reducers: complex query 2 *** FAILED *** Set(2) did not equal Set(2, 3) (ExchangeCoordinatorSuite.scala:472) The fix is to replace the use of DataInput/OutputStreams with LittleEndianDataInput/OutputStream objects in order to have these tests pass on big endian platforms With regards to the Dataset failure (using DF behind the scenes and also using the tungsten optimised agg function), here's a snippet of the failing test output == Physical Plan == TungstenAggregate(key=[value#1148], functions=[(ClassInputAgg$(b#1050,a#1051),mode=Final,isDistinct=false)], output=[value#1148,ClassInputAgg$(b,a)#1162]) TungstenExchange (HashPartitioning 5), None TungstenAggregate(key=[value#1148], functions=[(ClassInputAgg$(b#1050,a#1051),mode=Partial,isDistinct=false)], output=[value#1148,value#1158]) !AppendColumns , class[a[0]: int, b[0]: string], class[value[0]: string], [value#1148] Project [one AS b#1050,1 AS a#1051] Scan OneRowRelation[] == Results == !== Correct Answer - 1 == == Spark Answer - 1 == ![one,1][one,9] (QueryTest.scala:127) This is for the third checkAnswer call in the reordering test: checkAnswer( ds.groupBy(_.b).agg(ClassInputAgg.toColumn), ("one", 1)) If we change our sql statement from val ds = sql("SELECT 'one' AS b, 1 as a").as[AggData] so that a is, say, 2, we get 10. With 3, we get 11, etc. > Address endian specific problems surfaced in 1.6 > > > Key: SPARK-12319 > URL: https://issues.apache.org/jira/browse/SPARK-12319 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.6.0 > Environment: BE platforms >Reporter: Adam Roberts >Priority: Critical > > JIRA to cover endian specific problems - since testing 1.6 I've noticed > problems with DataFrames on BE platforms, e.g. > https://issues.apache.org/jira/browse/SPARK-9858 > [~joshrosen] [~yhuai] > Current progress: using com.google.common.io.LittleEndianDataInputStream and > com.google.common.io.LittleEndianDataOutputStream within UnsafeRowSerializer > fixes three test failures in ExchangeCoordinatorSuite but I'm concerned > around performance/wider functional implications > "org.apache.spark.sql.DatasetAggregatorSuite.typed aggregation: class input > with reordering" fails as we expect "one, 1" but instead get "one, 9" - we > believe the issue lies within BitSetMethods.java, specifically around: return > (wi << 6) + subIndex + java.lang.Long.numberOfTrailingZeros(word); -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org