[jira] [Commented] (HIVE-8855) Automatic calculate reduce number for spark job [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14229385#comment-14229385 ] yuemeng commented on HIVE-8855: --- hi,Xuefu.Forgive meļ¼This is my first time using jira, do not know the right way, I will not do this anymore, I will choose the right way to ask for help or to contribute > Automatic calculate reduce number for spark job [Spark Branch] > -- > > Key: HIVE-8855 > URL: https://issues.apache.org/jira/browse/HIVE-8855 > Project: Hive > Issue Type: Sub-task > Components: Spark >Reporter: Chengxiang Li >Assignee: Jimmy Xiang > Labels: Spark-M3 > Fix For: spark-branch > > Attachments: HIVE-8855.1-spark.patch, HIVE-8855.2-spark.patch, > HIVE-8855.3-spark.patch, HIVE-8855.3-spark.patch > > > As the following up work of HIVE-8649, we should enable reduce number > automatic calculation for both local spark client and remote spark client. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-7329) Create SparkWork [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-7329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14225635#comment-14225635 ] yuemeng commented on HIVE-7329: --- hi,XueFu. i builted hive on spark (spark branch on https://github.com/apache/hive.git),and spark (master branch on https://github.com/apache/spark.git),and my spark assembly jar is spark-assembly-1.2.0-SNAPSHOT-hadoop2.4.0.jar,and set this jar path into hive-env.sh (set to HIVE_AUX_JARS_PATH),and start hive ,do follow commnad to start a query : set hive.execution.engine=spark; set spark.master=spark://:7077; set spark.eventLog.enabled=true; set spark.executor.memory=1024m; set spark.serializer=org.apache.spark.serializer.KryoSerializer; but it seems it still use mr for query engine,then i remote debug it ,and found it can't jump to spark engine. i do what this https://cwiki.apache.org/confluence/display/Hive/Hive+on+Spark%3A+Getting+Started be told,can u tell me anything wrong? > Create SparkWork [Spark Branch] > --- > > Key: HIVE-7329 > URL: https://issues.apache.org/jira/browse/HIVE-7329 > Project: Hive > Issue Type: Sub-task > Components: Spark >Affects Versions: 0.13.1 >Reporter: Xuefu Zhang >Assignee: Xuefu Zhang > Fix For: spark-branch > > Attachments: HIVE-7329.patch > > > This class encapsulates all the work objects that can be executed in a single > Spark job. > NO PRECOMMIT TESTS. This is for spark branch only. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8855) Automatic calculate reduce number for spark job[Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14224278#comment-14224278 ] yuemeng commented on HIVE-8855: --- HI,i built a hive-on-spark package (the hive src code with spark branch get from https://github.com/apache/hive.git )and built a spark (master branch get from https://github.com/apache/spark.git ),my assembly jar is spark-assembly-1.2.0-SNAPSHOT-hadoop2.4.0.jar,and configure this assembly jar path in hive-env.sh (export HIVE_AUX_JARS_PATH),and start metastore server ,then i run this comand in hive shell: set hive.execution.engine=spark; set spark.master=spark://xxx.xxx.x.x:7077; set spark.eventLog.enabled=true; set spark.executor.memory=1024m; set spark.serializer=org.apache.spark.serializer.KryoSerializer; then start a query ,but it seems it can't jump to spark engine,it still give me some error like : ication_time: 1416935573226 access_time: 0 block_replication: 0 blocksize: 0 fileId: 31127 childrenNum: 0 }} 14/11/26 01:12:53 [main]: INFO session.SessionState: No Tez session required at this point. hive.execution.engine=mr. i think it still use mr for query,i don't konw why ,and in the spark webui,i can't found any info during query in hive.can u tell me why? > Automatic calculate reduce number for spark job[Spark Branch] > - > > Key: HIVE-8855 > URL: https://issues.apache.org/jira/browse/HIVE-8855 > Project: Hive > Issue Type: Sub-task > Components: Spark >Reporter: Chengxiang Li >Assignee: Jimmy Xiang > Labels: Spark-M3 > Fix For: spark-branch > > Attachments: HIVE-8855.1-spark.patch, HIVE-8855.2-spark.patch > > > As the following up work of HIVE-8649, we should enable reduce number > automatic calculation for both local spark client and remote spark client. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-7292) Hive on Spark
[ https://issues.apache.org/jira/browse/HIVE-7292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14222599#comment-14222599 ] yuemeng commented on HIVE-7292: --- i am very interesting in hive on spark ,an try to use it,when i bulit it (download from https://github.com/apache/hive.git,and chose the spark branch)use maven with command: mvn package -DskipTests -Phadoop-2 -Pdist,but it give me some error like [ERROR] /home/ym/hive-on-spark/hive/ql/src/java/org/apache/hadoop/hive/ql/exec/spark/status/SparkJobStatus.java:[22,24] cannot find symbol [ERROR] symbol: class JobExecutionStatus [ERROR] location: package org.apache.spark [ERROR] /home/ym/hive-on-spark/hive/ql/src/java/org/apache/hadoop/hive/ql/exec/spark/status/SparkJobStatus.java:[33,10] cannot find symbol [ERROR] symbol: class JobExecutionStatus [ERROR] location: interface org.apache.hadoop.hive.ql.exec.spark.status.SparkJobStatus [ERROR] /home/ym/hive-on-spark/hive/ql/src/java/org/apache/hadoop/hive/ql/exec/spark/status/SparkJobMonitor.java:[31,24] cannot find symbol [ERROR] symbol: class JobExecutionStatus can you tell me why? > Hive on Spark > - > > Key: HIVE-7292 > URL: https://issues.apache.org/jira/browse/HIVE-7292 > Project: Hive > Issue Type: Improvement > Components: Spark >Reporter: Xuefu Zhang >Assignee: Xuefu Zhang > Labels: Spark-M1, Spark-M2, Spark-M3, Spark-M4, Spark-M5 > Attachments: Hive-on-Spark.pdf > > > Spark as an open-source data analytics cluster computing framework has gained > significant momentum recently. Many Hive users already have Spark installed > as their computing backbone. To take advantages of Hive, they still need to > have either MapReduce or Tez on their cluster. This initiative will provide > user a new alternative so that those user can consolidate their backend. > Secondly, providing such an alternative further increases Hive's adoption as > it exposes Spark users to a viable, feature-rich de facto standard SQL tools > on Hadoop. > Finally, allowing Hive to run on Spark also has performance benefits. Hive > queries, especially those involving multiple reducer stages, will run faster, > thus improving user experience as Tez does. > This is an umbrella JIRA which will cover many coming subtask. Design doc > will be attached here shortly, and will be on the wiki as well. Feedback from > the community is greatly appreciated! -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8951) Spark remote context doesn't work with local-cluster [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14222583#comment-14222583 ] yuemeng commented on HIVE-8951: --- hi,when i run a sql to test hive on spark ,but it seems that it can't jump to spark branch,for now ,the hive on spark can work well now? > Spark remote context doesn't work with local-cluster [Spark Branch] > --- > > Key: HIVE-8951 > URL: https://issues.apache.org/jira/browse/HIVE-8951 > Project: Hive > Issue Type: Sub-task > Components: Spark >Reporter: Xuefu Zhang > > What I did: > {code} > set spark.home=/home/xzhang/apache/spark; > set spark.master=local-cluster[2,1,2048]; > set hive.execution.engine=spark; > set spark.executor.memory=2g; > set spark.serializer=org.apache.spark.serializer.KryoSerializer; > set spark.io.compression.codec=org.apache.spark.io.LZFCompressionCodec; > select name, avg(value) as v from dec group by name order by v; > {code} > Exeptions seen: > {code} > 14/11/23 10:42:15 INFO Worker: Spark home: /home/xzhang/apache/spark > 14/11/23 10:42:15 INFO AppClient$ClientActor: Connecting to master > spark://xzdt.local:55151... > 14/11/23 10:42:15 INFO Master: Registering app Hive on Spark > 14/11/23 10:42:15 INFO Master: Registered app Hive on Spark with ID > app-20141123104215- > 14/11/23 10:42:15 INFO SparkDeploySchedulerBackend: Connected to Spark > cluster with app ID app-20141123104215- > 14/11/23 10:42:15 INFO NettyBlockTransferService: Server created on 41676 > 14/11/23 10:42:15 INFO BlockManagerMaster: Trying to register BlockManager > 14/11/23 10:42:15 INFO BlockManagerMasterActor: Registering block manager > xzdt.local:41676 with 265.0 MB RAM, BlockManagerId(, xzdt.local, > 41676) > 14/11/23 10:42:15 INFO BlockManagerMaster: Registered BlockManager > 14/11/23 10:42:15 INFO SparkDeploySchedulerBackend: SchedulerBackend is ready > for scheduling beginning after reached minRegisteredResourcesRatio: 0.0 > 14/11/23 10:42:20 WARN AbstractLifeCycle: FAILED > SelectChannelConnector@0.0.0.0:4040: java.net.BindException: Address already > in use > java.net.BindException: Address already in use > at sun.nio.ch.Net.bind0(Native Method) > at sun.nio.ch.Net.bind(Net.java:174) > at > sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:139) > at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:77) > at > org.eclipse.jetty.server.nio.SelectChannelConnector.open(SelectChannelConnector.java:187) > at > org.eclipse.jetty.server.AbstractConnector.doStart(AbstractConnector.java:316) > at > org.eclipse.jetty.server.nio.SelectChannelConnector.doStart(SelectChannelConnector.java:265) > at > org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:64) > at org.eclipse.jetty.server.Server.doStart(Server.java:293) > at > org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:64) > at > org.apache.spark.ui.JettyUtils$.org$apache$spark$ui$JettyUtils$$connect$1(JettyUtils.scala:194) > at org.apache.spark.ui.JettyUtils$$anonfun$2.apply(JettyUtils.scala:204) > at org.apache.spark.ui.JettyUtils$$anonfun$2.apply(JettyUtils.scala:204) > at > org.apache.spark.util.Utils$$anonfun$startServiceOnPort$1.apply$mcVI$sp(Utils.scala:1676) > at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:141) > at org.apache.spark.util.Utils$.startServiceOnPort(Utils.scala:1667) > at > org.apache.spark.ui.JettyUtils$.startJettyServer(JettyUtils.scala:204) > at org.apache.spark.ui.WebUI.bind(WebUI.scala:102) > at > org.apache.spark.SparkContext$$anonfun$10.apply(SparkContext.scala:267) > at > org.apache.spark.SparkContext$$anonfun$10.apply(SparkContext.scala:267) > at scala.Option.foreach(Option.scala:236) > at org.apache.spark.SparkContext.(SparkContext.scala:267) > at > org.apache.spark.api.java.JavaSparkContext.(JavaSparkContext.scala:61) > at > org.apache.hive.spark.client.RemoteDriver.(RemoteDriver.java:106) > at org.apache.hive.spark.client.RemoteDriver.main(RemoteDriver.java:362) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:616) > at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:353) > at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:75) > at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) > 14/11/23 10:42:20 WARN AbstractLifeCycle: FAILED > org.eclipse.jetty.serve