[ 
https://issues.apache.org/jira/browse/HIVE-5009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13736250#comment-13736250
 ] 

Benjamin Jakobus commented on HIVE-5009:
----------------------------------------

I'm trying to test my code (just cloned a new copy) but keep on getting a 
runtime exception (I haven't applied my patch yet):

1) Downloaded Hadoop 1.2.1.
2) ant -Dhadoop.version=1.2.1 clean package  (OK - no problems)
3) export HIVE_OPTS='-hiveconf mapred.job.tracker=local -hiveconf 
fs.default.name=file:///tmp \
    -hiveconf hive.metastore.warehouse.dir=file:///tmp/warehouse \
    -hiveconf 
javax.jdo.option.ConnectionURL=jdbc:derby:;databaseName=/tmp/metastore_db;create=true'
4) export HADOOP_HOME=~/Workspace/hadoop-1.2.1/
5) Running Hive via CLI (/build/dist/bin/hive) - show tables; quit;  (works)
6) Trying to run some test scripts I get:

java.lang.RuntimeException: Cannot serialize object
        at 
org.apache.hadoop.hive.ql.exec.Utilities$1.exceptionThrown(Utilities.java:598)
        at java.beans.XMLEncoder.writeStatement(XMLEncoder.java:426)
        at java.beans.XMLEncoder.writeObject(XMLEncoder.java:330)
        at 
org.apache.hadoop.hive.ql.exec.Utilities.serializeObject(Utilities.java:611)
        at org.apache.hadoop.hive.ql.plan.MapredWork.toXML(MapredWork.java:88)
        at 
org.apache.hadoop.hive.ql.optimizer.physical.CommonJoinTaskDispatcher.processCurrentTask(CommonJoinTaskDispatcher.java:505)
        at 
org.apache.hadoop.hive.ql.optimizer.physical.AbstractJoinTaskDispatcher.dispatch(AbstractJoinTaskDispatcher.java:182)
        at 
org.apache.hadoop.hive.ql.lib.TaskGraphWalker.dispatch(TaskGraphWalker.java:111)
        at 
org.apache.hadoop.hive.ql.lib.TaskGraphWalker.walk(TaskGraphWalker.java:194)
        at 
org.apache.hadoop.hive.ql.lib.TaskGraphWalker.startWalking(TaskGraphWalker.java:139)
        at 
org.apache.hadoop.hive.ql.optimizer.physical.CommonJoinResolver.resolve(CommonJoinResolver.java:79)
        at 
org.apache.hadoop.hive.ql.optimizer.physical.PhysicalOptimizer.optimize(PhysicalOptimizer.java:90)
        at 
org.apache.hadoop.hive.ql.parse.MapReduceCompiler.compile(MapReduceCompiler.java:292)
        at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:8333)
        at 
org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:278)
        at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:437)
        at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:341)
        at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:966)
        at org.apache.hadoop.hive.ql.Driver.run(Driver.java:878)
        at 
org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:259)
        at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:216)
        at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:413)
        at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:348)
        at 
org.apache.hadoop.hive.cli.CliDriver.processReader(CliDriver.java:446)
        at org.apache.hadoop.hive.cli.CliDriver.processFile(CliDriver.java:456)
        at 
org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:737)
        at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:675)
        at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:614)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:601)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:160)
Caused by: java.lang.Exception: XMLEncoder: discarding statement 
XMLEncoder.writeObject(MapredWork);
        ... 32 more
Caused by: java.lang.RuntimeException: Cannot serialize object
        at 
org.apache.hadoop.hive.ql.exec.Utilities$1.exceptionThrown(Utilities.java:598)
        at 
java.beans.DefaultPersistenceDelegate.initBean(DefaultPersistenceDelegate.java:256)
        at 
java.beans.DefaultPersistenceDelegate.initialize(DefaultPersistenceDelegate.java:400)
        at 
java.beans.PersistenceDelegate.writeObject(PersistenceDelegate.java:118)
        at java.beans.Encoder.writeObject(Encoder.java:74)
        at java.beans.XMLEncoder.writeObject(XMLEncoder.java:327)
        at java.beans.Encoder.writeExpression(Encoder.java:330)
        at java.beans.XMLEncoder.writeExpression(XMLEncoder.java:454)
        at 
java.beans.PersistenceDelegate.writeObject(PersistenceDelegate.java:115)
        at java.beans.Encoder.writeObject(Encoder.java:74)
        at java.beans.XMLEncoder.writeObject(XMLEncoder.java:327)
        at java.beans.Encoder.writeObject1(Encoder.java:258)
        at java.beans.Encoder.cloneStatement(Encoder.java:271)
        at java.beans.Encoder.writeStatement(Encoder.java:301)
        at java.beans.XMLEncoder.writeStatement(XMLEncoder.java:400)
        ... 31 more
Caused by: java.lang.RuntimeException: Cannot serialize object
        at 
org.apache.hadoop.hive.ql.exec.Utilities$1.exceptionThrown(Utilities.java:598)
        at 
java.beans.DefaultPersistenceDelegate.initBean(DefaultPersistenceDelegate.java:256)
        at 
java.beans.DefaultPersistenceDelegate.initialize(DefaultPersistenceDelegate.java:400)
        at 
java.beans.PersistenceDelegate.writeObject(PersistenceDelegate.java:118)
        at java.beans.Encoder.writeObject(Encoder.java:74)
        at java.beans.XMLEncoder.writeObject(XMLEncoder.java:327)
        at java.beans.Encoder.writeExpression(Encoder.java:330)
        at java.beans.XMLEncoder.writeExpression(XMLEncoder.java:454)
        at 
java.beans.DefaultPersistenceDelegate.doProperty(DefaultPersistenceDelegate.java:194)
        at 
java.beans.DefaultPersistenceDelegate.initBean(DefaultPersistenceDelegate.java:253)
        ... 44 more
Caused by: java.lang.RuntimeException: Cannot serialize object
        at 
org.apache.hadoop.hive.ql.exec.Utilities$1.exceptionThrown(Utilities.java:598)
        at 
java.beans.DefaultPersistenceDelegate.initBean(DefaultPersistenceDelegate.java:256)
        at 
java.beans.DefaultPersistenceDelegate.initialize(DefaultPersistenceDelegate.java:400)
        at 
java.beans.PersistenceDelegate.writeObject(PersistenceDelegate.java:118)
        at java.beans.Encoder.writeObject(Encoder.java:74)
        at java.beans.XMLEncoder.writeObject(XMLEncoder.java:327)
        at java.beans.Encoder.writeExpression(Encoder.java:330)
        at java.beans.XMLEncoder.writeExpression(XMLEncoder.java:454)
        at 
java.beans.PersistenceDelegate.writeObject(PersistenceDelegate.java:115)
        at java.beans.Encoder.writeObject(Encoder.java:74)
        at java.beans.XMLEncoder.writeObject(XMLEncoder.java:327)
        at java.beans.Encoder.writeExpression(Encoder.java:330)
        at java.beans.XMLEncoder.writeExpression(XMLEncoder.java:454)
        at 
java.beans.DefaultPersistenceDelegate.doProperty(DefaultPersistenceDelegate.java:194)
        at 
java.beans.DefaultPersistenceDelegate.initBean(DefaultPersistenceDelegate.java:253)
        ... 52 more
Caused by: java.lang.RuntimeException: Cannot serialize object
        at 
org.apache.hadoop.hive.ql.exec.Utilities$1.exceptionThrown(Utilities.java:598)
        at java.beans.XMLEncoder.writeStatement(XMLEncoder.java:426)
        at 
java.beans.DefaultPersistenceDelegate.invokeStatement(DefaultPersistenceDelegate.java:217)
        at 
java.beans.java_util_List_PersistenceDelegate.initialize(MetaData.java:649)
        at 
java.beans.PersistenceDelegate.initialize(PersistenceDelegate.java:212)
        at 
java.beans.DefaultPersistenceDelegate.initialize(DefaultPersistenceDelegate.java:398)
        at 
java.beans.PersistenceDelegate.writeObject(PersistenceDelegate.java:118)
        at java.beans.Encoder.writeObject(Encoder.java:74)
        at java.beans.XMLEncoder.writeObject(XMLEncoder.java:327)
        at java.beans.Encoder.writeExpression(Encoder.java:330)
        at java.beans.XMLEncoder.writeExpression(XMLEncoder.java:454)
        at 
java.beans.PersistenceDelegate.writeObject(PersistenceDelegate.java:115)
        at java.beans.Encoder.writeObject(Encoder.java:74)
        at java.beans.XMLEncoder.writeObject(XMLEncoder.java:327)
        at java.beans.Encoder.writeExpression(Encoder.java:330)
        at java.beans.XMLEncoder.writeExpression(XMLEncoder.java:454)
        at 
java.beans.DefaultPersistenceDelegate.doProperty(DefaultPersistenceDelegate.java:194)
        at 
java.beans.DefaultPersistenceDelegate.initBean(DefaultPersistenceDelegate.java:253)
        ... 65 more
Caused by: java.lang.Exception: XMLEncoder: discarding statement 
ArrayList.add(ArrayList);
        ... 82 more
Caused by: java.lang.RuntimeException: Cannot serialize object
        at 
org.apache.hadoop.hive.ql.exec.Utilities$1.exceptionThrown(Utilities.java:598)
        at java.beans.XMLEncoder.writeStatement(XMLEncoder.java:426)
        at 
java.beans.DefaultPersistenceDelegate.invokeStatement(DefaultPersistenceDelegate.java:217)
        at 
java.beans.java_util_List_PersistenceDelegate.initialize(MetaData.java:649)
        at 
java.beans.PersistenceDelegate.initialize(PersistenceDelegate.java:212)
        at 
java.beans.DefaultPersistenceDelegate.initialize(DefaultPersistenceDelegate.java:398)
        at 
java.beans.PersistenceDelegate.writeObject(PersistenceDelegate.java:118)
        at java.beans.Encoder.writeObject(Encoder.java:74)
        at java.beans.XMLEncoder.writeObject(XMLEncoder.java:327)
        at java.beans.Encoder.writeExpression(Encoder.java:330)
        at java.beans.XMLEncoder.writeExpression(XMLEncoder.java:454)
        at 
java.beans.PersistenceDelegate.writeObject(PersistenceDelegate.java:115)
        at java.beans.Encoder.writeObject(Encoder.java:74)
        at java.beans.XMLEncoder.writeObject(XMLEncoder.java:327)
        at java.beans.Encoder.writeObject1(Encoder.java:258)
        at java.beans.Encoder.cloneStatement(Encoder.java:271)
        at java.beans.Encoder.writeStatement(Encoder.java:301)
        at java.beans.XMLEncoder.writeStatement(XMLEncoder.java:400)
        ... 81 more
Caused by: java.lang.Exception: XMLEncoder: discarding statement 
ArrayList.add(ASTNode);
        ... 98 more
Caused by: java.lang.RuntimeException: Cannot serialize object
        at 
org.apache.hadoop.hive.ql.exec.Utilities$1.exceptionThrown(Utilities.java:598)
        at 
java.beans.DefaultPersistenceDelegate.initBean(DefaultPersistenceDelegate.java:238)
        at 
java.beans.DefaultPersistenceDelegate.initialize(DefaultPersistenceDelegate.java:400)
        at 
java.beans.PersistenceDelegate.writeObject(PersistenceDelegate.java:118)
        at java.beans.Encoder.writeObject(Encoder.java:74)
        at java.beans.XMLEncoder.writeObject(XMLEncoder.java:327)
        at java.beans.Encoder.writeExpression(Encoder.java:330)
        at java.beans.XMLEncoder.writeExpression(XMLEncoder.java:454)
        at 
java.beans.PersistenceDelegate.writeObject(PersistenceDelegate.java:115)
        at java.beans.Encoder.writeObject(Encoder.java:74)
        at java.beans.XMLEncoder.writeObject(XMLEncoder.java:327)
        at java.beans.Encoder.writeObject1(Encoder.java:258)
        at java.beans.Encoder.cloneStatement(Encoder.java:271)
        at java.beans.Encoder.writeStatement(Encoder.java:301)
        at java.beans.XMLEncoder.writeStatement(XMLEncoder.java:400)
        ... 97 more
Caused by: java.lang.RuntimeException: Cannot serialize object
        at 
org.apache.hadoop.hive.ql.exec.Utilities$1.exceptionThrown(Utilities.java:598)
        at java.beans.Encoder.getValue(Encoder.java:108)
        at java.beans.Encoder.get(Encoder.java:252)
        at 
java.beans.PersistenceDelegate.writeObject(PersistenceDelegate.java:112)
        at java.beans.Encoder.writeObject(Encoder.java:74)
        at java.beans.XMLEncoder.writeObject(XMLEncoder.java:327)
        at java.beans.Encoder.writeExpression(Encoder.java:330)
        at java.beans.XMLEncoder.writeExpression(XMLEncoder.java:454)
        at 
java.beans.PersistenceDelegate.writeObject(PersistenceDelegate.java:115)
        at java.beans.Encoder.writeObject(Encoder.java:74)
        at java.beans.XMLEncoder.writeObject(XMLEncoder.java:327)
        at java.beans.Encoder.writeExpression(Encoder.java:330)
        at java.beans.XMLEncoder.writeExpression(XMLEncoder.java:454)
        at 
java.beans.DefaultPersistenceDelegate.initBean(DefaultPersistenceDelegate.java:232)
        ... 110 more
Caused by: java.lang.InstantiationException: org.antlr.runtime.CommonToken
        at java.lang.Class.newInstance0(Class.java:359)
        at java.lang.Class.newInstance(Class.java:327)
        at sun.reflect.GeneratedMethodAccessor84.invoke(Unknown Source)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:601)
        at sun.reflect.misc.Trampoline.invoke(MethodUtil.java:75)
        at sun.reflect.GeneratedMethodAccessor81.invoke(Unknown Source)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:601)
        at sun.reflect.misc.MethodUtil.invoke(MethodUtil.java:279)
        at java.beans.Statement.invokeInternal(Statement.java:292)
        at java.beans.Statement.access$000(Statement.java:58)
        at java.beans.Statement$2.run(Statement.java:185)
        at java.security.AccessController.doPrivileged(Native Method)
        at java.beans.Statement.invoke(Statement.java:182)
        at java.beans.Expression.getValue(Expression.java:153)
        at java.beans.Encoder.getValue(Encoder.java:105)
        ... 122 more
FAILED: SemanticException Generate Map Join Task Error: Cannot serialize object



                
> Fix minor optimization issues
> -----------------------------
>
>                 Key: HIVE-5009
>                 URL: https://issues.apache.org/jira/browse/HIVE-5009
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Benjamin Jakobus
>            Assignee: Benjamin Jakobus
>            Priority: Minor
>             Fix For: 0.12.0
>
>         Attachments: AbstractBucketJoinProc.java
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> I have found some minor optimization issues in the codebase, which I would 
> like to rectify and contribute. Specifically, these are:
> The optimizations that could be applied to Hive's code base are as follows:
> 1. Use StringBuffer when appending strings - In 184 instances, the 
> concatination operator (+=) was used when appending strings. This is 
> inherintly inefficient - instead Java's StringBuffer or StringBuilder class 
> should be used. 12 instances of this optimization can be applied to the 
> GenMRSkewJoinProcessor class and another three to the optimizer. CliDriver 
> uses the + operator inside a loop, so does the column projection utilities 
> class (ColumnProjectionUtils) and the aforementioned skew-join processor. 
> Tests showed that using the StringBuilder when appending strings is 57\% 
> faster than using the + operator (using the StringBuffer took 122 
> milliseconds whilst the + operator took 284 milliseconds). The reason as to 
> why using the StringBuffer class is preferred over using the + operator, is 
> because
> String third = first + second;
> gets compiled to:
> StringBuilder builder = new StringBuilder( first );
> builder.append( second );
> third = builder.toString();
> Therefore, when building complex strings, that, for example involve loops, 
> require many instantiations (and as discussed below, creating new objects 
> inside loops is inefficient).
> 2. Use arrays instead of List - Java's java.util.Arrays class asList method 
> is a more efficient at creating  creating lists from arrays than using loops 
> to manually iterate over the elements (using asList is computationally very 
> cheap, O(1), as it merely creates a wrapper object around the array; looping 
> through the list however has a complexity of O(n) since a new list is created 
> and every element in the array is added to this new list). As confirmed by 
> the experiment detailed in Appendix D, the Java compiler does not 
> automatically optimize and replace tight-loop copying with asList: the 
> loop-copying of 1,000,000 items took 15 milliseconds whilst using asList is 
> instant. 
> Four instances of this optimization can be applied to Hive's codebase (two of 
> these should be applied to the Map-Join container - MapJoinRowContainer) - 
> lines 92 to 98:
>  for (obj = other.first(); obj != null; obj = other.next()) {
>       ArrayList<Object> ele = new ArrayList(obj.length);
>       for (int i = 0; i < obj.length; i++) {
>         ele.add(obj[i]);
>       }
>       list.add((Row) ele);
>     }
> 3. Unnecessary wrapper object creation - In 31 cases, wrapper object creation 
> could be avoided by simply using the provided static conversion methods. As 
> noted in the PMD documentation, "using these avoids the cost of creating 
> objects that also need to be garbage-collected later."
> For example, line 587 of the SemanticAnalyzer class, could be replaced by the 
> more efficient parseDouble method call:
> // Inefficient:
> Double percent = Double.valueOf(value).doubleValue();
> // To be replaced by:
> Double percent = Double.parseDouble(value);
> Our test case in Appendix D confirms this: converting 10,000 strings into 
> integers using Integer.parseInt(gen.nextSessionId()) (i.e. creating an 
> unnecessary wrapper object) took 119 on average; using parseInt() took only 
> 38. Therefore creating even just one unnecessary wrapper object can make your 
> code up to 68% slower.
> 4. Converting literals to strings using + "" - Converting literals to strings 
> using + "" is quite inefficient (see Appendix D) and should be done by 
> calling the toString() method instead: converting 1,000,000 integers to 
> strings using + "" took, on average, 1340 milliseconds whilst using the 
> toString() method only required 1183 milliseconds (hence adding empty strings 
> takes nearly 12% more time). 
> 89 instances of this using + "" when converting literals were found in Hive's 
> codebase - one of these are found in the JoinUtil.
> 5. Avoid manual copying of arrays - Instead of copying arrays as is done in 
> GroupByOperator on line 1040 (see below), the more efficient System.arraycopy 
> can be used (arraycopy is a native method meaning that the entire memory 
> block is copied using memcpy or mmove).
> // Line 1040 of the GroupByOperator
> for (int i = 0; i < keys.length; i++) {
>       forwardCache[i] = keys[i];
> }   
> Using System.arraycopy on an array of 10,000 strings was (close to) instant 
> whilst the manual copy took 6 milliseconds.
> 11 instances of this optimization should be applied to the Hive codebase.
> 6. Avoiding instantiation inside loops - As noted in the PMD documentation, 
> "new objects created within loops should be checked to see if they can 
> created outside them and reused.". 
> Declaring variables inside a loop (i from 0 to 10,000) took 300 milliseconds
> whilst declaring them outside took only 88 milliseconds (this can be 
> explained by the fact that when declaring a variable outside the loop, its 
> reference will be re-used for each iteration. However when declaring 
> variables inside a loop, new references will be created for each iteration. 
> In our case, 10,000 references will be created by the time that this loop 
> finishes, meaning lots of work in terms of memory allocation and garbage 
> collection). 1623 instances of this optimization can be applied.
> To summarize, I propose to modify the code to address issue 1 and issue 6 
> (remaining issues (2 - 5) will be addressed later). Details are specified as 
> sub-tasks.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to