[jira] [Commented] (HIVE-8855) Automatic calculate reduce number for spark job [Spark Branch]

2014-11-30 Thread yuemeng (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14229385#comment-14229385
 ] 

yuemeng commented on HIVE-8855:
---

hi,Xuefu.Forgive meļ¼ŒThis is my first time using jira, do not know the right 
way, I will not do this anymore, I will choose the right way to ask for help or 
to contribute

> Automatic calculate reduce number for spark job [Spark Branch]
> --
>
> Key: HIVE-8855
> URL: https://issues.apache.org/jira/browse/HIVE-8855
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Chengxiang Li
>Assignee: Jimmy Xiang
>  Labels: Spark-M3
> Fix For: spark-branch
>
> Attachments: HIVE-8855.1-spark.patch, HIVE-8855.2-spark.patch, 
> HIVE-8855.3-spark.patch, HIVE-8855.3-spark.patch
>
>
> As the following up work of HIVE-8649, we should enable reduce number 
> automatic calculation for both local spark client and remote spark client.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-7329) Create SparkWork [Spark Branch]

2014-11-25 Thread yuemeng (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14225635#comment-14225635
 ] 

yuemeng commented on HIVE-7329:
---

hi,XueFu. i  builted hive on spark (spark branch on 
https://github.com/apache/hive.git),and spark (master branch on 
https://github.com/apache/spark.git),and my spark assembly jar is 
spark-assembly-1.2.0-SNAPSHOT-hadoop2.4.0.jar,and set this jar path into 
hive-env.sh (set to HIVE_AUX_JARS_PATH),and start hive ,do follow commnad to 
start a query :
set hive.execution.engine=spark;
set spark.master=spark://:7077;
set spark.eventLog.enabled=true; 
set spark.executor.memory=1024m;
set spark.serializer=org.apache.spark.serializer.KryoSerializer;
but it seems it still use mr for query engine,then i remote debug it ,and  
found it can't  jump to spark engine.
i do what this 
https://cwiki.apache.org/confluence/display/Hive/Hive+on+Spark%3A+Getting+Started
 be told,can u tell me anything wrong?

> Create SparkWork [Spark Branch]
> ---
>
> Key: HIVE-7329
> URL: https://issues.apache.org/jira/browse/HIVE-7329
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Affects Versions: 0.13.1
>Reporter: Xuefu Zhang
>Assignee: Xuefu Zhang
> Fix For: spark-branch
>
> Attachments: HIVE-7329.patch
>
>
> This class encapsulates all the work objects that can be executed in a single 
> Spark job.
> NO PRECOMMIT TESTS. This is for spark branch only.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8855) Automatic calculate reduce number for spark job[Spark Branch]

2014-11-25 Thread yuemeng (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14224278#comment-14224278
 ] 

yuemeng commented on HIVE-8855:
---

HI,i built a hive-on-spark package (the hive src code with spark branch get 
from https://github.com/apache/hive.git )and built a spark (master branch get 
from https://github.com/apache/spark.git ),my assembly jar is 
spark-assembly-1.2.0-SNAPSHOT-hadoop2.4.0.jar,and configure this assembly jar 
path in hive-env.sh (export HIVE_AUX_JARS_PATH),and start metastore server 
,then i run this comand in hive shell:
set hive.execution.engine=spark;
set spark.master=spark://xxx.xxx.x.x:7077;
set spark.eventLog.enabled=true; 
set spark.executor.memory=1024m;
set spark.serializer=org.apache.spark.serializer.KryoSerializer;
then start a query ,but it seems it can't jump to spark engine,it still give me 
some error like :
ication_time: 1416935573226 access_time: 0 block_replication: 0 blocksize: 0 
fileId: 31127 childrenNum: 0 }}
14/11/26 01:12:53 [main]: INFO session.SessionState: No Tez session required at 
this point. hive.execution.engine=mr.
i think it still use mr for query,i don't konw why ,and in the spark webui,i 
can't found any info during query in hive.can u tell me why?


> Automatic calculate reduce number for spark job[Spark Branch]
> -
>
> Key: HIVE-8855
> URL: https://issues.apache.org/jira/browse/HIVE-8855
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Chengxiang Li
>Assignee: Jimmy Xiang
>  Labels: Spark-M3
> Fix For: spark-branch
>
> Attachments: HIVE-8855.1-spark.patch, HIVE-8855.2-spark.patch
>
>
> As the following up work of HIVE-8649, we should enable reduce number 
> automatic calculation for both local spark client and remote spark client.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-7292) Hive on Spark

2014-11-23 Thread yuemeng (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14222599#comment-14222599
 ] 

yuemeng commented on HIVE-7292:
---

i am very interesting in hive on spark ,an try to use it,when i bulit it 
(download from https://github.com/apache/hive.git,and chose the spark 
branch)use maven with command: mvn package -DskipTests -Phadoop-2 -Pdist,but it 
give me some error like 
[ERROR] 
/home/ym/hive-on-spark/hive/ql/src/java/org/apache/hadoop/hive/ql/exec/spark/status/SparkJobStatus.java:[22,24]
 cannot find symbol
[ERROR] symbol:   class JobExecutionStatus
[ERROR] location: package org.apache.spark
[ERROR] 
/home/ym/hive-on-spark/hive/ql/src/java/org/apache/hadoop/hive/ql/exec/spark/status/SparkJobStatus.java:[33,10]
 cannot find symbol
[ERROR] symbol:   class JobExecutionStatus
[ERROR] location: interface 
org.apache.hadoop.hive.ql.exec.spark.status.SparkJobStatus
[ERROR] 
/home/ym/hive-on-spark/hive/ql/src/java/org/apache/hadoop/hive/ql/exec/spark/status/SparkJobMonitor.java:[31,24]
 cannot find symbol
[ERROR] symbol:   class JobExecutionStatus
can you tell me why?

> Hive on Spark
> -
>
> Key: HIVE-7292
> URL: https://issues.apache.org/jira/browse/HIVE-7292
> Project: Hive
>  Issue Type: Improvement
>  Components: Spark
>Reporter: Xuefu Zhang
>Assignee: Xuefu Zhang
>  Labels: Spark-M1, Spark-M2, Spark-M3, Spark-M4, Spark-M5
> Attachments: Hive-on-Spark.pdf
>
>
> Spark as an open-source data analytics cluster computing framework has gained 
> significant momentum recently. Many Hive users already have Spark installed 
> as their computing backbone. To take advantages of Hive, they still need to 
> have either MapReduce or Tez on their cluster. This initiative will provide 
> user a new alternative so that those user can consolidate their backend. 
> Secondly, providing such an alternative further increases Hive's adoption as 
> it exposes Spark users  to a viable, feature-rich de facto standard SQL tools 
> on Hadoop.
> Finally, allowing Hive to run on Spark also has performance benefits. Hive 
> queries, especially those involving multiple reducer stages, will run faster, 
> thus improving user experience as Tez does.
> This is an umbrella JIRA which will cover many coming subtask. Design doc 
> will be attached here shortly, and will be on the wiki as well. Feedback from 
> the community is greatly appreciated!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8951) Spark remote context doesn't work with local-cluster [Spark Branch]

2014-11-23 Thread yuemeng (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14222583#comment-14222583
 ] 

yuemeng commented on HIVE-8951:
---

hi,when i run a sql to test hive on spark ,but it seems that it can't jump to 
spark branch,for now ,the hive on spark can work well now?

> Spark remote context doesn't work with local-cluster [Spark Branch]
> ---
>
> Key: HIVE-8951
> URL: https://issues.apache.org/jira/browse/HIVE-8951
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Xuefu Zhang
>
> What I did:
> {code}
> set spark.home=/home/xzhang/apache/spark;
> set spark.master=local-cluster[2,1,2048];
> set hive.execution.engine=spark; 
> set spark.executor.memory=2g;
> set spark.serializer=org.apache.spark.serializer.KryoSerializer;
> set spark.io.compression.codec=org.apache.spark.io.LZFCompressionCodec;
> select name, avg(value) as v from dec group by name order by v;
> {code}
> Exeptions seen:
> {code}
> 14/11/23 10:42:15 INFO Worker: Spark home: /home/xzhang/apache/spark
> 14/11/23 10:42:15 INFO AppClient$ClientActor: Connecting to master 
> spark://xzdt.local:55151...
> 14/11/23 10:42:15 INFO Master: Registering app Hive on Spark
> 14/11/23 10:42:15 INFO Master: Registered app Hive on Spark with ID 
> app-20141123104215-
> 14/11/23 10:42:15 INFO SparkDeploySchedulerBackend: Connected to Spark 
> cluster with app ID app-20141123104215-
> 14/11/23 10:42:15 INFO NettyBlockTransferService: Server created on 41676
> 14/11/23 10:42:15 INFO BlockManagerMaster: Trying to register BlockManager
> 14/11/23 10:42:15 INFO BlockManagerMasterActor: Registering block manager 
> xzdt.local:41676 with 265.0 MB RAM, BlockManagerId(, xzdt.local, 
> 41676)
> 14/11/23 10:42:15 INFO BlockManagerMaster: Registered BlockManager
> 14/11/23 10:42:15 INFO SparkDeploySchedulerBackend: SchedulerBackend is ready 
> for scheduling beginning after reached minRegisteredResourcesRatio: 0.0
> 14/11/23 10:42:20 WARN AbstractLifeCycle: FAILED 
> SelectChannelConnector@0.0.0.0:4040: java.net.BindException: Address already 
> in use
> java.net.BindException: Address already in use
>   at sun.nio.ch.Net.bind0(Native Method)
>   at sun.nio.ch.Net.bind(Net.java:174)
>   at 
> sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:139)
>   at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:77)
>   at 
> org.eclipse.jetty.server.nio.SelectChannelConnector.open(SelectChannelConnector.java:187)
>   at 
> org.eclipse.jetty.server.AbstractConnector.doStart(AbstractConnector.java:316)
>   at 
> org.eclipse.jetty.server.nio.SelectChannelConnector.doStart(SelectChannelConnector.java:265)
>   at 
> org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:64)
>   at org.eclipse.jetty.server.Server.doStart(Server.java:293)
>   at 
> org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:64)
>   at 
> org.apache.spark.ui.JettyUtils$.org$apache$spark$ui$JettyUtils$$connect$1(JettyUtils.scala:194)
>   at org.apache.spark.ui.JettyUtils$$anonfun$2.apply(JettyUtils.scala:204)
>   at org.apache.spark.ui.JettyUtils$$anonfun$2.apply(JettyUtils.scala:204)
>   at 
> org.apache.spark.util.Utils$$anonfun$startServiceOnPort$1.apply$mcVI$sp(Utils.scala:1676)
>   at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:141)
>   at org.apache.spark.util.Utils$.startServiceOnPort(Utils.scala:1667)
>   at 
> org.apache.spark.ui.JettyUtils$.startJettyServer(JettyUtils.scala:204)
>   at org.apache.spark.ui.WebUI.bind(WebUI.scala:102)
>   at 
> org.apache.spark.SparkContext$$anonfun$10.apply(SparkContext.scala:267)
>   at 
> org.apache.spark.SparkContext$$anonfun$10.apply(SparkContext.scala:267)
>   at scala.Option.foreach(Option.scala:236)
>   at org.apache.spark.SparkContext.(SparkContext.scala:267)
>   at 
> org.apache.spark.api.java.JavaSparkContext.(JavaSparkContext.scala:61)
>   at 
> org.apache.hive.spark.client.RemoteDriver.(RemoteDriver.java:106)
>   at org.apache.hive.spark.client.RemoteDriver.main(RemoteDriver.java:362)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:616)
>   at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:353)
>   at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:75)
>   at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> 14/11/23 10:42:20 WARN AbstractLifeCycle: FAILED 
> org.eclipse.jetty.serve