[jira] [Commented] (SPARK-12319) Address endian specific problems surfaced in 1.6

Tim Preece (JIRA) Mon, 21 Dec 2015 08:07:24 -0800

    [ 
https://issues.apache.org/jira/browse/SPARK-12319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15066628#comment-15066628
 ]


Tim Preece commented on SPARK-12319:
------------------------------------

I notice for the failing testcase the schema ( for row1) mismatches the actual 
data in row1.
Row1 has schema:
    SpecificUnsafeRowJoiner schema1 
StructType(StructField(a,IntegerType,false), StructField(b,StringType,true))
But row 1 has the following data ( i.e. a string followed by int )
    row1 [0,1800000003,1,656e6f]

So why doesn't the schema mismatch the data? 

The name of the failing test may give a clue!

test("typed aggregation: class input with reordering") {
    val ds = sql("SELECT 'one' AS b, 1 as a").as[AggData]

    checkAnswer(
      ds.select(ClassInputAgg.toColumn),
      1)

    checkAnswer(
      ds.select(expr("avg(a)").as[Double], ClassInputAgg.toColumn),
      (1.0, 1))

    checkAnswer(
      ds.groupBy(_.b).agg(ClassInputAgg.toColumn),
      ("one", 1))
  } 

> Address endian specific problems surfaced in 1.6
> ------------------------------------------------
>
>                 Key: SPARK-12319
>                 URL: https://issues.apache.org/jira/browse/SPARK-12319
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 1.6.0
>         Environment: Problems apparent on BE, LE could be impacted too
>            Reporter: Adam Roberts
>            Priority: Critical
>
> JIRA to cover endian specific problems - since testing 1.6 I've noticed 
> problems with DataFrames on BE platforms, e.g. 
> https://issues.apache.org/jira/browse/SPARK-9858
> [~joshrosen] [~yhuai]
> Current progress: using com.google.common.io.LittleEndianDataInputStream and 
> com.google.common.io.LittleEndianDataOutputStream within UnsafeRowSerializer 
> fixes three test failures in ExchangeCoordinatorSuite but I'm concerned 
> around performance/wider functional implications
> "org.apache.spark.sql.DatasetAggregatorSuite.typed aggregation: class input 
> with reordering" fails as we expect "one, 1" but instead get "one, 9" - we 
> believe the issue lies within BitSetMethods.java, specifically around: return 
> (wi << 6) + subIndex + java.lang.Long.numberOfTrailingZeros(word); 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-12319) Address endian specific problems surfaced in 1.6

Reply via email to