[ 
https://issues.apache.org/jira/browse/CARBONDATA-1839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xuchuanyin updated CARBONDATA-1839:
-----------------------------------
    Description: 
Carbondata provide an option to optimize data load process by compressing the 
intermediate sort temp files.

The option is `carbon.is.sort.temp.file.compression.enabled` and its default 
value is `false`. In some disk tense scenario, user can turn on this feature by 
setting the option `true`, it will compress the file content before writing it 
to disk.

How ever I have found bugs in the related code and the data load was failed 
after turning on this feature.

This bug can be reproduced easily. I used the example from `TestLoadDataFrame` 
Line98.

1. create a dataframe (e.g. 320000 rows with 3 columns)
2. set carbon.is.sort.temp.file.compression.enabled=true in CarbonProperities
3. write the dataframe to a carbontable through dataframewriter

Error messages are shown as below:

```
17/11/29 18:04:12 ERROR SortDataRows: SortDataRowPool:test1 
java.lang.ClassCastException: [B cannot be cast to [Ljava.lang.Integer;
    at 
org.apache.carbondata.core.util.NonDictionaryUtil.getDimension(NonDictionaryUtil.java:93)
    at 
org.apache.carbondata.processing.sort.sortdata.UnCompressedTempSortFileWriter.writeDataOutputStream(UnCompressedTempSortFileWriter.java:52)
    at 
org.apache.carbondata.processing.sort.sortdata.CompressedTempSortFileWriter.writeSortTempFile(CompressedTempSortFileWriter.java:65)
    at 
org.apache.carbondata.processing.sort.sortdata.SortTempFileChunkWriter.writeSortTempFile(SortTempFileChunkWriter.java:72)
    at 
org.apache.carbondata.processing.sort.sortdata.SortDataRows.writeSortTempFile(SortDataRows.java:245)
    at 
org.apache.carbondata.processing.sort.sortdata.SortDataRows.writeDataTofile(SortDataRows.java:232)
    at 
org.apache.carbondata.processing.sort.sortdata.SortDataRows.access$300(SortDataRows.java:45)
    at 
org.apache.carbondata.processing.sort.sortdata.SortDataRows$DataSorterAndWriter.run(SortDataRows.java:426)
    at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)
```
```
17/11/29 18:04:13 ERROR SortDataRows: SafeParallelSorterPool:test1 exception 
occurred while trying to acquire a semaphore lock: Task 
org.apache.carbondata.processing.sort.sortdata.SortDataRows$DataSorterAndWriter@3d413b40
 rejected from java.util.concurrent.ThreadPoolExecutor@cb56011[Terminated, pool 
size = 0, active threads = 0, queued tasks = 0, completed tasks = 1]
17/11/29 18:04:13 ERROR ParallelReadMergeSorterImpl: 
SafeParallelSorterPool:test1 
org.apache.carbondata.processing.sort.exception.CarbonSortKeyAndGroupByException:
 
    at 
org.apache.carbondata.processing.sort.sortdata.SortDataRows.addRowBatch(SortDataRows.java:173)
    at 
org.apache.carbondata.processing.loading.sort.impl.ParallelReadMergeSorterImpl$SortIteratorThread.run(ParallelReadMergeSorterImpl.java:227)
    at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)
Caused by: java.util.concurrent.RejectedExecutionException: Task 
org.apache.carbondata.processing.sort.sortdata.SortDataRows$DataSorterAndWriter@3d413b40
 rejected from java.util.concurrent.ThreadPoolExecutor@cb56011[Terminated, pool 
size = 0, active threads = 0, queued tasks = 0, completed tasks = 1]
    at 
java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2047)
    at 
java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:823)
    at 
java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1369)
    at 
org.apache.carbondata.processing.sort.sortdata.SortDataRows.addRowBatch(SortDataRows.java:169)
    ... 4 more
```
```
    at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)
Caused by: 
org.apache.carbondata.processing.sort.exception.CarbonSortKeyAndGroupByException:
 
    at 
org.apache.carbondata.processing.sort.sortdata.SortDataRows.addRowBatch(SortDataRows.java:173)
    at 
org.apache.carbondata.processing.loading.sort.impl.ParallelReadMergeSorterImpl$SortIteratorThread.run(ParallelReadMergeSorterImpl.java:227)
    ... 3 more
Caused by: java.util.concurrent.RejectedExecutionException: Task 
org.apache.carbondata.processing.sort.sortdata.SortDataRows$DataSorterAndWriter@3d413b40
 rejected from java.util.concurrent.ThreadPoolExecutor@cb56011[Terminated, pool 
size = 0, active threads = 0, queued tasks = 0, completed tasks = 1]
    at 
java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2047)
    at 
java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:823)
    at 
java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1369)
    at 
org.apache.carbondata.processing.sort.sortdata.SortDataRows.addRowBatch(SortDataRows.java:169)
    ... 4 more
```

  was:
Carbondata provide an option to optimize data load process by compressing the 
intermediate sort temp files.

The option is `carbon.is.sort.temp.file.compression.enabled` and its default 
value is `false`. In some disk tense scenario, user can turn on this feature by 
setting the option `true`, it will compress the file content before writing it 
to disk.

How ever I have found bugs in the related code and the data load was failed 
after turning on this feature.

This bug can be reproduced easily. Use

Error messages are shown as below:

```
17/11/29 18:04:12 ERROR SortDataRows: SortDataRowPool:test1 
java.lang.ClassCastException: [B cannot be cast to [Ljava.lang.Integer;
    at 
org.apache.carbondata.core.util.NonDictionaryUtil.getDimension(NonDictionaryUtil.java:93)
    at 
org.apache.carbondata.processing.sort.sortdata.UnCompressedTempSortFileWriter.writeDataOutputStream(UnCompressedTempSortFileWriter.java:52)
    at 
org.apache.carbondata.processing.sort.sortdata.CompressedTempSortFileWriter.writeSortTempFile(CompressedTempSortFileWriter.java:65)
    at 
org.apache.carbondata.processing.sort.sortdata.SortTempFileChunkWriter.writeSortTempFile(SortTempFileChunkWriter.java:72)
    at 
org.apache.carbondata.processing.sort.sortdata.SortDataRows.writeSortTempFile(SortDataRows.java:245)
    at 
org.apache.carbondata.processing.sort.sortdata.SortDataRows.writeDataTofile(SortDataRows.java:232)
    at 
org.apache.carbondata.processing.sort.sortdata.SortDataRows.access$300(SortDataRows.java:45)
    at 
org.apache.carbondata.processing.sort.sortdata.SortDataRows$DataSorterAndWriter.run(SortDataRows.java:426)
    at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)
```
```
17/11/29 18:04:13 ERROR SortDataRows: SafeParallelSorterPool:test1 exception 
occurred while trying to acquire a semaphore lock: Task 
org.apache.carbondata.processing.sort.sortdata.SortDataRows$DataSorterAndWriter@3d413b40
 rejected from java.util.concurrent.ThreadPoolExecutor@cb56011[Terminated, pool 
size = 0, active threads = 0, queued tasks = 0, completed tasks = 1]
17/11/29 18:04:13 ERROR ParallelReadMergeSorterImpl: 
SafeParallelSorterPool:test1 
org.apache.carbondata.processing.sort.exception.CarbonSortKeyAndGroupByException:
 
    at 
org.apache.carbondata.processing.sort.sortdata.SortDataRows.addRowBatch(SortDataRows.java:173)
    at 
org.apache.carbondata.processing.loading.sort.impl.ParallelReadMergeSorterImpl$SortIteratorThread.run(ParallelReadMergeSorterImpl.java:227)
    at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)
Caused by: java.util.concurrent.RejectedExecutionException: Task 
org.apache.carbondata.processing.sort.sortdata.SortDataRows$DataSorterAndWriter@3d413b40
 rejected from java.util.concurrent.ThreadPoolExecutor@cb56011[Terminated, pool 
size = 0, active threads = 0, queued tasks = 0, completed tasks = 1]
    at 
java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2047)
    at 
java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:823)
    at 
java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1369)
    at 
org.apache.carbondata.processing.sort.sortdata.SortDataRows.addRowBatch(SortDataRows.java:169)
    ... 4 more
```
```
    at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)
Caused by: 
org.apache.carbondata.processing.sort.exception.CarbonSortKeyAndGroupByException:
 
    at 
org.apache.carbondata.processing.sort.sortdata.SortDataRows.addRowBatch(SortDataRows.java:173)
    at 
org.apache.carbondata.processing.loading.sort.impl.ParallelReadMergeSorterImpl$SortIteratorThread.run(ParallelReadMergeSorterImpl.java:227)
    ... 3 more
Caused by: java.util.concurrent.RejectedExecutionException: Task 
org.apache.carbondata.processing.sort.sortdata.SortDataRows$DataSorterAndWriter@3d413b40
 rejected from java.util.concurrent.ThreadPoolExecutor@cb56011[Terminated, pool 
size = 0, active threads = 0, queued tasks = 0, completed tasks = 1]
    at 
java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2047)
    at 
java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:823)
    at 
java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1369)
    at 
org.apache.carbondata.processing.sort.sortdata.SortDataRows.addRowBatch(SortDataRows.java:169)
    ... 4 more
```


> Data load failed when using compressed sort temp file
> -----------------------------------------------------
>
>                 Key: CARBONDATA-1839
>                 URL: https://issues.apache.org/jira/browse/CARBONDATA-1839
>             Project: CarbonData
>          Issue Type: Bug
>            Reporter: xuchuanyin
>            Assignee: xuchuanyin
>
> Carbondata provide an option to optimize data load process by compressing the 
> intermediate sort temp files.
> The option is `carbon.is.sort.temp.file.compression.enabled` and its default 
> value is `false`. In some disk tense scenario, user can turn on this feature 
> by setting the option `true`, it will compress the file content before 
> writing it to disk.
> How ever I have found bugs in the related code and the data load was failed 
> after turning on this feature.
> This bug can be reproduced easily. I used the example from 
> `TestLoadDataFrame` Line98.
> 1. create a dataframe (e.g. 320000 rows with 3 columns)
> 2. set carbon.is.sort.temp.file.compression.enabled=true in CarbonProperities
> 3. write the dataframe to a carbontable through dataframewriter
> Error messages are shown as below:
> ```
> 17/11/29 18:04:12 ERROR SortDataRows: SortDataRowPool:test1 
> java.lang.ClassCastException: [B cannot be cast to [Ljava.lang.Integer;
>     at 
> org.apache.carbondata.core.util.NonDictionaryUtil.getDimension(NonDictionaryUtil.java:93)
>     at 
> org.apache.carbondata.processing.sort.sortdata.UnCompressedTempSortFileWriter.writeDataOutputStream(UnCompressedTempSortFileWriter.java:52)
>     at 
> org.apache.carbondata.processing.sort.sortdata.CompressedTempSortFileWriter.writeSortTempFile(CompressedTempSortFileWriter.java:65)
>     at 
> org.apache.carbondata.processing.sort.sortdata.SortTempFileChunkWriter.writeSortTempFile(SortTempFileChunkWriter.java:72)
>     at 
> org.apache.carbondata.processing.sort.sortdata.SortDataRows.writeSortTempFile(SortDataRows.java:245)
>     at 
> org.apache.carbondata.processing.sort.sortdata.SortDataRows.writeDataTofile(SortDataRows.java:232)
>     at 
> org.apache.carbondata.processing.sort.sortdata.SortDataRows.access$300(SortDataRows.java:45)
>     at 
> org.apache.carbondata.processing.sort.sortdata.SortDataRows$DataSorterAndWriter.run(SortDataRows.java:426)
>     at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>     at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>     at java.lang.Thread.run(Thread.java:745)
> ```
> ```
> 17/11/29 18:04:13 ERROR SortDataRows: SafeParallelSorterPool:test1 exception 
> occurred while trying to acquire a semaphore lock: Task 
> org.apache.carbondata.processing.sort.sortdata.SortDataRows$DataSorterAndWriter@3d413b40
>  rejected from java.util.concurrent.ThreadPoolExecutor@cb56011[Terminated, 
> pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 1]
> 17/11/29 18:04:13 ERROR ParallelReadMergeSorterImpl: 
> SafeParallelSorterPool:test1 
> org.apache.carbondata.processing.sort.exception.CarbonSortKeyAndGroupByException:
>  
>     at 
> org.apache.carbondata.processing.sort.sortdata.SortDataRows.addRowBatch(SortDataRows.java:173)
>     at 
> org.apache.carbondata.processing.loading.sort.impl.ParallelReadMergeSorterImpl$SortIteratorThread.run(ParallelReadMergeSorterImpl.java:227)
>     at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>     at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>     at java.lang.Thread.run(Thread.java:745)
> Caused by: java.util.concurrent.RejectedExecutionException: Task 
> org.apache.carbondata.processing.sort.sortdata.SortDataRows$DataSorterAndWriter@3d413b40
>  rejected from java.util.concurrent.ThreadPoolExecutor@cb56011[Terminated, 
> pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 1]
>     at 
> java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2047)
>     at 
> java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:823)
>     at 
> java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1369)
>     at 
> org.apache.carbondata.processing.sort.sortdata.SortDataRows.addRowBatch(SortDataRows.java:169)
>     ... 4 more
> ```
> ```
>     at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>     at java.lang.Thread.run(Thread.java:745)
> Caused by: 
> org.apache.carbondata.processing.sort.exception.CarbonSortKeyAndGroupByException:
>  
>     at 
> org.apache.carbondata.processing.sort.sortdata.SortDataRows.addRowBatch(SortDataRows.java:173)
>     at 
> org.apache.carbondata.processing.loading.sort.impl.ParallelReadMergeSorterImpl$SortIteratorThread.run(ParallelReadMergeSorterImpl.java:227)
>     ... 3 more
> Caused by: java.util.concurrent.RejectedExecutionException: Task 
> org.apache.carbondata.processing.sort.sortdata.SortDataRows$DataSorterAndWriter@3d413b40
>  rejected from java.util.concurrent.ThreadPoolExecutor@cb56011[Terminated, 
> pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 1]
>     at 
> java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2047)
>     at 
> java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:823)
>     at 
> java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1369)
>     at 
> org.apache.carbondata.processing.sort.sortdata.SortDataRows.addRowBatch(SortDataRows.java:169)
>     ... 4 more
> ```



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to