JingChen created CRUNCH-624:
-------------------------------
Summary: temporary table size is 0, which makes reducer number too
small
Key: CRUNCH-624
URL: https://issues.apache.org/jira/browse/CRUNCH-624
Project: Crunch
Issue Type: Bug
Components: Core
Reporter: JingChen
Assignee: Josh Wills
if the pipeline produce temporary table , the reduce number of the temporary
table whose input table is temporary table may become very small in some cases,
since temporary table has no content .
And, I may found the root cause in my caseļ¼
public void materializeAt(SourceTarget<S> sourceTarget) {
this.materializedAt = sourceTarget;
this.size = materializedAt.getSize(getPipeline().getConfiguration());
}
@Override
public long getSize() {
if (size < 0) {
this.size = getSizeInternal();
}
return size;
}
PColletionImpl.materializeAt(sourceTarget) this method will be invoked when
node splits to create temporary table, source sourceTarget binds with the new
temporary table whose size is 0, since its path was just created, the this.size
will be 0. After that, when getSize() was invoked by setting reduce number,
since the size is 0, it will just return 0, which makes reduce number too small.
So i think the code of materializeAt() should check sourceTarget's size, like
below:
public void materializeAt(SourceTarget<S> sourceTarget) {
this.materializedAt = sourceTarget;
long size = materializedAt.getSize(getPipeline().getConfiguration());
if (size > 0)
this.size = size;
}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)