[ 
https://issues.apache.org/jira/browse/SPARK-9858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15047194#comment-15047194
 ] 

Adam Roberts commented on SPARK-9858:
-------------------------------------

Several potential issues here, may well not be with this code itself though - 
I'm consistently encountering problems for two different big endian platforms 
while testing this

1) is this thread safe? I've noticed if we print the rowBuffer when using more 
than one thread for our SQLContext, the ordering of elements is not consistent 
and we sometimes have two rows printed consecutively

2) For the aggregate, join, and complex query 2 tests, I consistently receive 
more bytes per partition and instead of estimating (0, 2) for the indices we 
get (0, 2, 4). I know we're using the UnsafeRowSerializer and so wary if the 
issue lies here instead, I see it's using Google's ByteStreams class to read in 
the bytes. Specifically I have 800, 800, 800, 800, 720 bytes per partition 
instead of 600, 600, 600, 600, 600

3) Where do the values used in the assertions for the test suite come from?

If we print the rows we see differences between the two platforms: (the 63 and 
70 is on our BE platform and this value differs each time we run the test)

Works perfectly on various architectures that are LE and hence the current 
endianness/serialization theory. Apologies if this would be better suited to 
the dev mailing list, although I expect I'm one of the few to be testing this 
on BE...

> Introduce an ExchangeCoordinator to estimate the number of post-shuffle 
> partitions.
> -----------------------------------------------------------------------------------
>
>                 Key: SPARK-9858
>                 URL: https://issues.apache.org/jira/browse/SPARK-9858
>             Project: Spark
>          Issue Type: Sub-task
>          Components: SQL
>            Reporter: Yin Huai
>            Assignee: Yin Huai
>             Fix For: 1.6.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to