[ 
https://issues.apache.org/jira/browse/HIVE-4693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13793254#comment-13793254
 ] 

Justin commented on HIVE-4693:
------------------------------

We have several skewed dimensions and not being able to utilize this causes 
severe performance degradation. I've tested by removing the skewed data.

Is there a workaround? I've tried setting hive.skewjoin.key to no avail.


> If you set hive.optimize.skewjoin=true, and number of identical keys is < 
> hive.skewjoin.key don't fail with FileNotFoundException
> ---------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-4693
>                 URL: https://issues.apache.org/jira/browse/HIVE-4693
>             Project: Hive
>          Issue Type: Improvement
>          Components: Query Processor
>    Affects Versions: 0.10.0
>            Reporter: Robert Justice
>
> We would like to set hive.optimize.skewjoin to true to use it when skew is 
> encountered, but if the number of identical keys is not met, it will crash 
> due to not finding hive_skew_join_bigkeys_0.  Could we just bail out and go 
> back to a standard join rather than failing?
> Ended Job = job_201306061640_0003
> java.io.FileNotFoundException: File 
> hdfs://nameservice1/tmp/hive-rjustice/hive_2013-06-07_10-02-03_755_605133549375679913/-mr-10003/hive_skew_join_bigkeys_0
>  does not exist.
>       at 
> org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:410)
>       at 
> org.apache.hadoop.hive.ql.plan.ConditionalResolverSkewJoin.getTasks(ConditionalResolverSkewJoin.java:96)
>       at 
> org.apache.hadoop.hive.ql.exec.ConditionalTask.execute(ConditionalTask.java:81)
>       at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:138)
>       at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57)
>       at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1374)
>       at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1160)
>       at org.apache.hadoop.hive.ql.Driver.run(Driver.java:973)
>       at org.apache.hadoop.hive.ql.Driver.run(Driver.java:893)
>       at 
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:259)
>       at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:216)
>       at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:412)
>       at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:759)
>       at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:613)
>       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>       at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>       at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>       at java.lang.reflect.Method.invoke(Method.java:597)
>       at org.apache.hadoop.util.RunJar.main(RunJar.java:208)
> Ended Job = 672729995, job is filtered out (removed at runtime).
> MapReduce Jobs Launched: 



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to