[jira] [Commented] (TEZ-3113) massive increase of run time using PipelinedSorter rather than DefaultSorter

2016-08-24 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15435318#comment-15435318
 ] 

Hitesh Shah commented on TEZ-3113:
--

[~mingma] [~cchepelov] any chance of updating the logs to help 
[~rajesh.balamohan] debug further? 

> massive increase of run time using PipelinedSorter rather than DefaultSorter
> 
>
> Key: TEZ-3113
> URL: https://issues.apache.org/jira/browse/TEZ-3113
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.8.2
> Environment: scalding 0.15-SNAPSHOT per 
> https://github.com/twitter/scalding/pull/1446
> cascading 3.1.0-wip-54
> tez-0.8.2
> OpenJDK 8 on AMD64
> Hadoop 2.6.0 (YARN, HDFS); Apache distribution
> Debian Linux 8
> 8 * Intel Core i7-3770K 
>Reporter: Cyrille Chépélov
>
> While running a (fairly complex) scalding DAG that was working fine using 
> tez-0.6.2, now under tez-0.8.2, the run time became suddenly extremely large.
> Reverting "tez.runtime.sorter.class" -> "LEGACY" restored proper behaviour.
> Difficulties can be traced to this shape of code:
> {code:scala}
> val x: TypedPipe[(String, String)] = ??? // get *LARGE* dataset 
> x
>   .group
>   .mapValues(x => 1L)
>   .sum
>   .write(TypedTsvHeader("foo.tsv", ('key, 'count)))
> {code}
> where the incoming data contains many, many different keys. Observed 
> behaviour of PipelinedSorter is that several hundred thousand different files 
> are put flat in the same per-TezChild local temporary directories, and thing 
> become very slow (not alleging any causality).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-3113) massive increase of run time using PipelinedSorter rather than DefaultSorter

2016-07-26 Thread Ming Ma (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15394807#comment-15394807
 ] 

Ming Ma commented on TEZ-3113:
--

We also ran into similar issue.

The pipeline sorter setting:

{noformat}
2016-07-26 18:05:48,398 [INFO] [TezChild] |impl.PipelinedSorter|: Setting up 
PipelinedSorter for CB70EF1A28614FED8BA2F7C7D271D11F: , 
UsingHashComparator=false#blocks=1, maxMemUsage=104857600, 
lazyAllocateMem=false, minBlockSize=2097152000, initial BLOCK_SIZE=104857600, 
finalMergeEnabled=true, pipelinedShuffle=false, sendEmptyPartitions=true, 
tez.runtime.io.sort.mb=100
{noformat}

>From the call stack below, regularly a single record is larger than sort 
>buffer size and thus caused the sorter to spill the single record.

{noformat}
at 
cascading.tuple.hadoop.TupleSerialization.getSerializationFactory(TupleSerialization.java:296)
at 
cascading.tuple.hadoop.TupleSerialization.getNewSerializer(TupleSerialization.java:602)
at 
cascading.tuple.hadoop.TupleSerialization$SerializationElementWriter.getSerializer(TupleSerialization.java:1033)
at 
cascading.tuple.hadoop.TupleSerialization$SerializationElementWriter.write(TupleSerialization.java:1013)
at 
cascading.tuple.io.TupleOutputStream.writeElement(TupleOutputStream.java:170)
at 
cascading.tuple.io.TupleOutputStream.writeUnTyped(TupleOutputStream.java:135)
at 
cascading.tuple.hadoop.io.BaseSerializer.serialize(BaseSerializer.java:72)
at 
cascading.tuple.hadoop.io.BaseSerializer.serialize(BaseSerializer.java:32)
at 
org.apache.tez.runtime.library.common.sort.impl.IFile$Writer.append(IFile.java:277)
at 
org.apache.tez.runtime.library.common.sort.impl.PipelinedSorter.spillSingleRecord(PipelinedSorter.java:484)
at 
org.apache.tez.runtime.library.common.sort.impl.PipelinedSorter.collect(PipelinedSorter.java:409)
- locked <0x0007682297a8> (a 
org.apache.tez.runtime.library.common.sort.impl.PipelinedSorter)
at 
org.apache.tez.runtime.library.common.sort.impl.PipelinedSorter.collect(PipelinedSorter.java:415)
- locked <0x0007682297a8> (a 
org.apache.tez.runtime.library.common.sort.impl.PipelinedSorter)
at 
org.apache.tez.runtime.library.common.sort.impl.PipelinedSorter.collect(PipelinedSorter.java:415)
- locked <0x0007682297a8> (a 
org.apache.tez.runtime.library.common.sort.impl.PipelinedSorter)
at 
org.apache.tez.runtime.library.common.sort.impl.PipelinedSorter.write(PipelinedSorter.java:362)
at 
org.apache.tez.runtime.library.output.OrderedPartitionedKVOutput$1.write(OrderedPartitionedKVOutput.java:163)
{noformat}

There are lots of entries like this in the container log.

{noformat}
2016-07-26 00:06:16,279 [INFO] [TezChild] |impl.PipelinedSorter|: 
C1A0DD178C964C6F90553E14996CD011: done sorting span=0, length=0, time=0
2016-07-26 00:06:16,279 [INFO] [TezChild] |impl.PipelinedSorter|: 
C1A0DD178C964C6F90553E14996CD011: reserved.remaining()=104857600, 
reserved.metasize=16777216
2016-07-26 00:06:16,279 [INFO] [TezChild] |impl.PipelinedSorter|: 
C1A0DD178C964C6F90553E14996CD011: done sorting span=0, length=0, time=0
2016-07-26 00:06:16,279 [INFO] [TezChild] |impl.PipelinedSorter|: 
C1A0DD178C964C6F90553E14996CD011: reserved.remaining()=104857600, 
reserved.metasize=16777216
2016-07-26 00:06:16,279 [INFO] [TezChild] |impl.PipelinedSorter|: 
C1A0DD178C964C6F90553E14996CD011: done sorting span=0, length=0, time=0
2016-07-26 00:06:16,280 [INFO] [TezChild] |impl.PipelinedSorter|: 
C1A0DD178C964C6F90553E14996CD011: reserved.remaining()=104857600, 
reserved.metasize=16777216
2016-07-26 00:06:16,280 [INFO] [TezChild] |impl.PipelinedSorter|: 
C1A0DD178C964C6F90553E14996CD011: Spilling to 
/.../application_1466709597073_272545/output/attempt_1466709597073_272545_1_02_001204_0_10007_14/file.out,
 
indexFilename=/.../application_1466709597073_272545/output/attempt_1466709597073_272545_1_02_001204_0_10007_14/file.out.index
{noformat}

> massive increase of run time using PipelinedSorter rather than DefaultSorter
> 
>
> Key: TEZ-3113
> URL: https://issues.apache.org/jira/browse/TEZ-3113
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.8.2
> Environment: scalding 0.15-SNAPSHOT per 
> https://github.com/twitter/scalding/pull/1446
> cascading 3.1.0-wip-54
> tez-0.8.2
> OpenJDK 8 on AMD64
> Hadoop 2.6.0 (YARN, HDFS); Apache distribution
> Debian Linux 8
> 8 * Intel Core i7-3770K 
>Reporter: Cyrille Chépélov
>
> While running a (fairly complex) scalding DAG that was working fine using 
> tez-0.6.2, now under tez-0.8.2, the run time became suddenly extremely large.
> Reverting "tez.runtime.sorter.class" -> "LEGACY" restored proper behaviour.
> Difficulties can be traced to this 

[jira] [Commented] (TEZ-3113) massive increase of run time using PipelinedSorter rather than DefaultSorter

2016-06-20 Thread Tsuyoshi Ozawa (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15339044#comment-15339044
 ] 

Tsuyoshi Ozawa commented on TEZ-3113:
-

> Observed behaviour of PipelinedSorter is that several hundred thousand 
> different files are put flat in the same per-TezChild local temporary 
> directories, and thing become very slow (not alleging any causality)

I think this behaviour can cause file-system level lock contention if lots 
threads access.

One possible solution is partitioning by adding upper limits per directory or 
changing directory structure. [~rajesh.balamohan] what do you think?

> massive increase of run time using PipelinedSorter rather than DefaultSorter
> 
>
> Key: TEZ-3113
> URL: https://issues.apache.org/jira/browse/TEZ-3113
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.8.2
> Environment: scalding 0.15-SNAPSHOT per 
> https://github.com/twitter/scalding/pull/1446
> cascading 3.1.0-wip-54
> tez-0.8.2
> OpenJDK 8 on AMD64
> Hadoop 2.6.0 (YARN, HDFS); Apache distribution
> Debian Linux 8
> 8 * Intel Core i7-3770K 
>Reporter: Cyrille Chépélov
>
> While running a (fairly complex) scalding DAG that was working fine using 
> tez-0.6.2, now under tez-0.8.2, the run time became suddenly extremely large.
> Reverting "tez.runtime.sorter.class" -> "LEGACY" restored proper behaviour.
> Difficulties can be traced to this shape of code:
> {code:scala}
> val x: TypedPipe[(String, String)] = ??? // get *LARGE* dataset 
> x
>   .group
>   .mapValues(x => 1L)
>   .sum
>   .write(TypedTsvHeader("foo.tsv", ('key, 'count)))
> {code}
> where the incoming data contains many, many different keys. Observed 
> behaviour of PipelinedSorter is that several hundred thousand different files 
> are put flat in the same per-TezChild local temporary directories, and thing 
> become very slow (not alleging any causality).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-3113) massive increase of run time using PipelinedSorter rather than DefaultSorter

2016-02-11 Thread Rajesh Balamohan (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15142954#comment-15142954
 ] 

Rajesh Balamohan commented on TEZ-3113:
---

Is it possible to share the logs and the sort.mb settings for this job?

> massive increase of run time using PipelinedSorter rather than DefaultSorter
> 
>
> Key: TEZ-3113
> URL: https://issues.apache.org/jira/browse/TEZ-3113
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.8.2
> Environment: scalding 0.15-SNAPSHOT per 
> https://github.com/twitter/scalding/pull/1446
> cascading 3.1.0-wip-54
> tez-0.8.2
> OpenJDK 8 on AMD64
> Hadoop 2.6.0 (YARN, HDFS); Apache distribution
> Debian Linux 8
> 8 * Intel Core i7-3770K 
>Reporter: Cyrille Chépélov
>
> While running a (fairly complex) scalding DAG that was working fine using 
> tez-0.6.2, now under tez-0.8.2, the run time became suddenly extremely large.
> Reverting "tez.runtime.sorter.class" -> "LEGACY" restored proper behaviour.
> Difficulties can be traced to this shape of code:
> {code:scala}
> val x: TypedPipe[(String, String)] = ??? // get *LARGE* dataset 
> x
>   .group
>   .mapValues(x => 1L)
>   .sum
>   .write(TypedTsvHeader("foo.tsv", ('key, 'count)))
> {code}
> where the incoming data contains many, many different keys. Observed 
> behaviour of PipelinedSorter is that several hundred thousand different files 
> are put flat in the same per-TezChild local temporary directories, and thing 
> become very slow (not alleging any causality).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)