[
https://issues.apache.org/jira/browse/PIG-3204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13583602#comment-13583602
]
Rohini Palaniswamy commented on PIG-3204:
-----------------------------------------
cat log4j.conf
{code}
# ***** Set root logger level to DEBUG and its only appender to A.
log4j.rootLogger=debug, A
# ***** A is set to be a ConsoleAppender.
log4j.appender.A=org.apache.log4j.ConsoleAppender
# ***** A uses PatternLayout.
log4j.appender.A.layout=org.apache.log4j.PatternLayout
log4j.appender.A.layout.ConversionPattern=%d [%t] %-5p %c %x - %m%n
{code}
cat simpleload.pig
{code}
A = LOAD '/tmp/data';
STORE A into '/tmp/out';
{code}
pig -log4jconf ~/pig/log4j.conf simpleload.pig
Doing
{code}
sed -n '/Pig features used in the script/,/getDelegationToken/p' /tmp/debug.log
| grep getFileInfo | wc -l
{code}
gives 20 getFileInfo calls if /tmp/data is a directory and 35 calls if
/tmp/data is a file.
grep org.apache.pig.builtin.JsonMetadata /tmp/debug.log gives 10 statements of
2013-02-21 22:04:41,096 [main] DEBUG org.apache.pig.builtin.JsonMetadata -
Could not find schema file for /tmp/data
Haven't stepped through the code, but based on the logs seems to be a good
candidate for optimization to cut down on the number of FS calls.
> Optimize the number of FS calls to get schema to cut down time before job
> launch
> --------------------------------------------------------------------------------
>
> Key: PIG-3204
> URL: https://issues.apache.org/jira/browse/PIG-3204
> Project: Pig
> Issue Type: Improvement
> Reporter: Rohini Palaniswamy
>
> Currently there are a lot of NN calls made to determine if there is a
> schema file for a path in a LOAD statement. When there is a slow NN(caused by
> whole bunch of other issues), it takes a lot of time for this and we found
> the scripts spending anywhere from 5 mins to 40 mins depending upon the
> script. It seems to be a good place for optimization.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira