[jira] [Updated] (SPARK-18800) Correct the assert in UnsafeKVExternalSorter which ensures array size

2016-12-24 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-18800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-18800:
--
Assignee: Liang-Chi Hsieh
Priority: Minor  (was: Major)

> Correct the assert in UnsafeKVExternalSorter which ensures array size
> -
>
> Key: SPARK-18800
> URL: https://issues.apache.org/jira/browse/SPARK-18800
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Liang-Chi Hsieh
>Assignee: Liang-Chi Hsieh
>Priority: Minor
> Fix For: 2.2.0
>
>
> UnsafeKVExternalSorter uses UnsafeInMemorySorter to sort the records of 
> BytesToBytesMap if it is given a map.
> Currently we use the number of keys in BytesToBytesMap to determine if the 
> array used for sort is enough or not. We has an assert that ensures the size 
> of the array is enough: map.numKeys() <= map.getArray().size() / 2.
> However, each record in the map takes two entries in the array, one is record 
> pointer, another is key prefix. So the correct assert should be map.numKeys() 
> * 2 <= map.getArray().size() / 2.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-18800) Correct the assert in UnsafeKVExternalSorter which ensures array size

2016-12-20 Thread Liang-Chi Hsieh (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-18800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liang-Chi Hsieh updated SPARK-18800:

Issue Type: Improvement  (was: Bug)

> Correct the assert in UnsafeKVExternalSorter which ensures array size
> -
>
> Key: SPARK-18800
> URL: https://issues.apache.org/jira/browse/SPARK-18800
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Liang-Chi Hsieh
>
> UnsafeKVExternalSorter uses UnsafeInMemorySorter to sort the records of 
> BytesToBytesMap if it is given a map.
> Currently we use the number of keys in BytesToBytesMap to determine if the 
> array used for sort is enough or not. We has an assert that ensures the size 
> of the array is enough: map.numKeys() <= map.getArray().size() / 2.
> However, each record in the map takes two entries in the array, one is record 
> pointer, another is key prefix. So the correct assert should be map.numKeys() 
> * 2 <= map.getArray().size() / 2.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-18800) Correct the assert in UnsafeKVExternalSorter which ensures array size

2016-12-20 Thread Liang-Chi Hsieh (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-18800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liang-Chi Hsieh updated SPARK-18800:

Description: 
UnsafeKVExternalSorter uses UnsafeInMemorySorter to sort the records of 
BytesToBytesMap if it is given a map.

Currently we use the number of keys in BytesToBytesMap to determine if the 
array used for sort is enough or not. We has an assert that ensures the size of 
the array is enough: map.numKeys() <= map.getArray().size() / 2.

However, each record in the map takes two entries in the array, one is record 
pointer, another is key prefix. So the correct assert should be map.numKeys() * 
2 <= map.getArray().size() / 2.

  was:
UnsafeKVExternalSorter uses UnsafeInMemorySorter to sort the records of 
BytesToBytesMap if it is given a map.

Currently we use the number of keys in BytesToBytesMap to determine if the 
array used for sort is enough or not. It should be wrong. Because we can have 
multiple values of the same key. Extremely said, you can have 
BytesToBytesMap.numKeys() == 1, but BytesToBytesMap.numValues() is a big number.

In this case, we cannot just use BytesToBytesMap's array to do sorting. 
Otherwise, a exception will be thrown like this:

{code}
[info] - SPARK-18800: pass BytesToBytesMap which contains numValues is more 
than numKeys *** FAILED *** (61 milliseconds)
[info]   java.lang.IllegalStateException: There is no space for new record
[info]   at 
org.apache.spark.util.collection.unsafe.sort.UnsafeInMemorySorter.insertRecord(UnsafeInMemorySorter.jav
a:225)
[info]   at 
org.apache.spark.sql.execution.UnsafeKVExternalSorter.(UnsafeKVExternalSorter.java:147)
{code}


> Correct the assert in UnsafeKVExternalSorter which ensures array size
> -
>
> Key: SPARK-18800
> URL: https://issues.apache.org/jira/browse/SPARK-18800
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Liang-Chi Hsieh
>
> UnsafeKVExternalSorter uses UnsafeInMemorySorter to sort the records of 
> BytesToBytesMap if it is given a map.
> Currently we use the number of keys in BytesToBytesMap to determine if the 
> array used for sort is enough or not. We has an assert that ensures the size 
> of the array is enough: map.numKeys() <= map.getArray().size() / 2.
> However, each record in the map takes two entries in the array, one is record 
> pointer, another is key prefix. So the correct assert should be map.numKeys() 
> * 2 <= map.getArray().size() / 2.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-18800) Correct the assert in UnsafeKVExternalSorter which ensures array size

2016-12-20 Thread Liang-Chi Hsieh (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-18800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liang-Chi Hsieh updated SPARK-18800:

Summary: Correct the assert in UnsafeKVExternalSorter which ensures array 
size  (was: UnsafeInMemorySorter throws exception when used in 
UnsafeKVExternalSorter)

> Correct the assert in UnsafeKVExternalSorter which ensures array size
> -
>
> Key: SPARK-18800
> URL: https://issues.apache.org/jira/browse/SPARK-18800
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Liang-Chi Hsieh
>
> UnsafeKVExternalSorter uses UnsafeInMemorySorter to sort the records of 
> BytesToBytesMap if it is given a map.
> Currently we use the number of keys in BytesToBytesMap to determine if the 
> array used for sort is enough or not. It should be wrong. Because we can have 
> multiple values of the same key. Extremely said, you can have 
> BytesToBytesMap.numKeys() == 1, but BytesToBytesMap.numValues() is a big 
> number.
> In this case, we cannot just use BytesToBytesMap's array to do sorting. 
> Otherwise, a exception will be thrown like this:
> {code}
> [info] - SPARK-18800: pass BytesToBytesMap which contains numValues is more 
> than numKeys *** FAILED *** (61 milliseconds)
> [info]   java.lang.IllegalStateException: There is no space for new record
> [info]   at 
> org.apache.spark.util.collection.unsafe.sort.UnsafeInMemorySorter.insertRecord(UnsafeInMemorySorter.jav
> a:225)
> [info]   at 
> org.apache.spark.sql.execution.UnsafeKVExternalSorter.(UnsafeKVExternalSorter.java:147)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org