[jira] [Commented] (HIVE-7292) Hive on Spark

2015-02-18 Thread Peter Lin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14326302#comment-14326302
 ] 

Peter Lin commented on HIVE-7292:
-

Would love to use this production, is it going to release in hive 15?

 Hive on Spark
 -

 Key: HIVE-7292
 URL: https://issues.apache.org/jira/browse/HIVE-7292
 Project: Hive
  Issue Type: Improvement
  Components: Spark
Reporter: Xuefu Zhang
Assignee: Xuefu Zhang
  Labels: Spark-M1, Spark-M2, Spark-M3, Spark-M4, Spark-M5
 Attachments: Hive-on-Spark.pdf


 Spark as an open-source data analytics cluster computing framework has gained 
 significant momentum recently. Many Hive users already have Spark installed 
 as their computing backbone. To take advantages of Hive, they still need to 
 have either MapReduce or Tez on their cluster. This initiative will provide 
 user a new alternative so that those user can consolidate their backend. 
 Secondly, providing such an alternative further increases Hive's adoption as 
 it exposes Spark users  to a viable, feature-rich de facto standard SQL tools 
 on Hadoop.
 Finally, allowing Hive to run on Spark also has performance benefits. Hive 
 queries, especially those involving multiple reducer stages, will run faster, 
 thus improving user experience as Tez does.
 This is an umbrella JIRA which will cover many coming subtask. Design doc 
 will be attached here shortly, and will be on the wiki as well. Feedback from 
 the community is greatly appreciated!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-7292) Hive on Spark

2015-02-18 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14326336#comment-14326336
 ] 

Xuefu Zhang commented on HIVE-7292:
---

Formerly 0.15, now 1.1 is going to be release soon. Release candidate is out.

 Hive on Spark
 -

 Key: HIVE-7292
 URL: https://issues.apache.org/jira/browse/HIVE-7292
 Project: Hive
  Issue Type: Improvement
  Components: Spark
Reporter: Xuefu Zhang
Assignee: Xuefu Zhang
  Labels: Spark-M1, Spark-M2, Spark-M3, Spark-M4, Spark-M5
 Attachments: Hive-on-Spark.pdf


 Spark as an open-source data analytics cluster computing framework has gained 
 significant momentum recently. Many Hive users already have Spark installed 
 as their computing backbone. To take advantages of Hive, they still need to 
 have either MapReduce or Tez on their cluster. This initiative will provide 
 user a new alternative so that those user can consolidate their backend. 
 Secondly, providing such an alternative further increases Hive's adoption as 
 it exposes Spark users  to a viable, feature-rich de facto standard SQL tools 
 on Hadoop.
 Finally, allowing Hive to run on Spark also has performance benefits. Hive 
 queries, especially those involving multiple reducer stages, will run faster, 
 thus improving user experience as Tez does.
 This is an umbrella JIRA which will cover many coming subtask. Design doc 
 will be attached here shortly, and will be on the wiki as well. Feedback from 
 the community is greatly appreciated!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-7292) Hive on Spark

2015-02-18 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14326335#comment-14326335
 ] 

Xuefu Zhang commented on HIVE-7292:
---

Formerly 0.15, now 1.1 is going to be release soon. Release candidate is out.

 Hive on Spark
 -

 Key: HIVE-7292
 URL: https://issues.apache.org/jira/browse/HIVE-7292
 Project: Hive
  Issue Type: Improvement
  Components: Spark
Reporter: Xuefu Zhang
Assignee: Xuefu Zhang
  Labels: Spark-M1, Spark-M2, Spark-M3, Spark-M4, Spark-M5
 Attachments: Hive-on-Spark.pdf


 Spark as an open-source data analytics cluster computing framework has gained 
 significant momentum recently. Many Hive users already have Spark installed 
 as their computing backbone. To take advantages of Hive, they still need to 
 have either MapReduce or Tez on their cluster. This initiative will provide 
 user a new alternative so that those user can consolidate their backend. 
 Secondly, providing such an alternative further increases Hive's adoption as 
 it exposes Spark users  to a viable, feature-rich de facto standard SQL tools 
 on Hadoop.
 Finally, allowing Hive to run on Spark also has performance benefits. Hive 
 queries, especially those involving multiple reducer stages, will run faster, 
 thus improving user experience as Tez does.
 This is an umbrella JIRA which will cover many coming subtask. Design doc 
 will be attached here shortly, and will be on the wiki as well. Feedback from 
 the community is greatly appreciated!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-7292) Hive on Spark

2015-02-18 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14326675#comment-14326675
 ] 

Lefty Leverenz commented on HIVE-7292:
--

Doc note:  See comments on HIVE-9257 and HIVE-9448 for documentation issues.

* [HIVE-9257 commit comment with doc notes | 
https://issues.apache.org/jira/browse/HIVE-9257?focusedCommentId=14273166page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14273166]
* HIVE-9448 doc comments
** [list of configuration parameters | 
https://issues.apache.org/jira/browse/HIVE-9448?focusedCommentId=14292487page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14292487]
** [where documented | 
https://issues.apache.org/jira/browse/HIVE-9448?focusedCommentId=14298353page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14298353]

 Hive on Spark
 -

 Key: HIVE-7292
 URL: https://issues.apache.org/jira/browse/HIVE-7292
 Project: Hive
  Issue Type: Improvement
  Components: Spark
Reporter: Xuefu Zhang
Assignee: Xuefu Zhang
  Labels: Spark-M1, Spark-M2, Spark-M3, Spark-M4, Spark-M5
 Attachments: Hive-on-Spark.pdf


 Spark as an open-source data analytics cluster computing framework has gained 
 significant momentum recently. Many Hive users already have Spark installed 
 as their computing backbone. To take advantages of Hive, they still need to 
 have either MapReduce or Tez on their cluster. This initiative will provide 
 user a new alternative so that those user can consolidate their backend. 
 Secondly, providing such an alternative further increases Hive's adoption as 
 it exposes Spark users  to a viable, feature-rich de facto standard SQL tools 
 on Hadoop.
 Finally, allowing Hive to run on Spark also has performance benefits. Hive 
 queries, especially those involving multiple reducer stages, will run faster, 
 thus improving user experience as Tez does.
 This is an umbrella JIRA which will cover many coming subtask. Design doc 
 will be attached here shortly, and will be on the wiki as well. Feedback from 
 the community is greatly appreciated!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-7292) Hive on Spark

2014-12-04 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14234264#comment-14234264
 ] 

Xuefu Zhang commented on HIVE-7292:
---

[~libing], I assume you assigned this JIRA to yourself by mistake. However, let 
me know if you plan to work on this. Thanks.

 Hive on Spark
 -

 Key: HIVE-7292
 URL: https://issues.apache.org/jira/browse/HIVE-7292
 Project: Hive
  Issue Type: Improvement
  Components: Spark
Reporter: Xuefu Zhang
Assignee: Bing Li
  Labels: Spark-M1, Spark-M2, Spark-M3, Spark-M4, Spark-M5
 Attachments: Hive-on-Spark.pdf


 Spark as an open-source data analytics cluster computing framework has gained 
 significant momentum recently. Many Hive users already have Spark installed 
 as their computing backbone. To take advantages of Hive, they still need to 
 have either MapReduce or Tez on their cluster. This initiative will provide 
 user a new alternative so that those user can consolidate their backend. 
 Secondly, providing such an alternative further increases Hive's adoption as 
 it exposes Spark users  to a viable, feature-rich de facto standard SQL tools 
 on Hadoop.
 Finally, allowing Hive to run on Spark also has performance benefits. Hive 
 queries, especially those involving multiple reducer stages, will run faster, 
 thus improving user experience as Tez does.
 This is an umbrella JIRA which will cover many coming subtask. Design doc 
 will be attached here shortly, and will be on the wiki as well. Feedback from 
 the community is greatly appreciated!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-7292) Hive on Spark

2014-11-23 Thread yuemeng (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14222599#comment-14222599
 ] 

yuemeng commented on HIVE-7292:
---

i am very interesting in hive on spark ,an try to use it,when i bulit it 
(download from https://github.com/apache/hive.git,and chose the spark 
branch)use maven with command: mvn package -DskipTests -Phadoop-2 -Pdist,but it 
give me some error like 
[ERROR] 
/home/ym/hive-on-spark/hive/ql/src/java/org/apache/hadoop/hive/ql/exec/spark/status/SparkJobStatus.java:[22,24]
 cannot find symbol
[ERROR] symbol:   class JobExecutionStatus
[ERROR] location: package org.apache.spark
[ERROR] 
/home/ym/hive-on-spark/hive/ql/src/java/org/apache/hadoop/hive/ql/exec/spark/status/SparkJobStatus.java:[33,10]
 cannot find symbol
[ERROR] symbol:   class JobExecutionStatus
[ERROR] location: interface 
org.apache.hadoop.hive.ql.exec.spark.status.SparkJobStatus
[ERROR] 
/home/ym/hive-on-spark/hive/ql/src/java/org/apache/hadoop/hive/ql/exec/spark/status/SparkJobMonitor.java:[31,24]
 cannot find symbol
[ERROR] symbol:   class JobExecutionStatus
can you tell me why?

 Hive on Spark
 -

 Key: HIVE-7292
 URL: https://issues.apache.org/jira/browse/HIVE-7292
 Project: Hive
  Issue Type: Improvement
  Components: Spark
Reporter: Xuefu Zhang
Assignee: Xuefu Zhang
  Labels: Spark-M1, Spark-M2, Spark-M3, Spark-M4, Spark-M5
 Attachments: Hive-on-Spark.pdf


 Spark as an open-source data analytics cluster computing framework has gained 
 significant momentum recently. Many Hive users already have Spark installed 
 as their computing backbone. To take advantages of Hive, they still need to 
 have either MapReduce or Tez on their cluster. This initiative will provide 
 user a new alternative so that those user can consolidate their backend. 
 Secondly, providing such an alternative further increases Hive's adoption as 
 it exposes Spark users  to a viable, feature-rich de facto standard SQL tools 
 on Hadoop.
 Finally, allowing Hive to run on Spark also has performance benefits. Hive 
 queries, especially those involving multiple reducer stages, will run faster, 
 thus improving user experience as Tez does.
 This is an umbrella JIRA which will cover many coming subtask. Design doc 
 will be attached here shortly, and will be on the wiki as well. Feedback from 
 the community is greatly appreciated!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-7292) Hive on Spark

2014-11-23 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14222600#comment-14222600
 ] 

Xuefu Zhang commented on HIVE-7292:
---

[~yuemeng], you can try removing org/apache/spark folder in your local maven 
repo to see if it fixes it.

 Hive on Spark
 -

 Key: HIVE-7292
 URL: https://issues.apache.org/jira/browse/HIVE-7292
 Project: Hive
  Issue Type: Improvement
  Components: Spark
Reporter: Xuefu Zhang
Assignee: Xuefu Zhang
  Labels: Spark-M1, Spark-M2, Spark-M3, Spark-M4, Spark-M5
 Attachments: Hive-on-Spark.pdf


 Spark as an open-source data analytics cluster computing framework has gained 
 significant momentum recently. Many Hive users already have Spark installed 
 as their computing backbone. To take advantages of Hive, they still need to 
 have either MapReduce or Tez on their cluster. This initiative will provide 
 user a new alternative so that those user can consolidate their backend. 
 Secondly, providing such an alternative further increases Hive's adoption as 
 it exposes Spark users  to a viable, feature-rich de facto standard SQL tools 
 on Hadoop.
 Finally, allowing Hive to run on Spark also has performance benefits. Hive 
 queries, especially those involving multiple reducer stages, will run faster, 
 thus improving user experience as Tez does.
 This is an umbrella JIRA which will cover many coming subtask. Design doc 
 will be attached here shortly, and will be on the wiki as well. Feedback from 
 the community is greatly appreciated!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-7292) Hive on Spark

2014-11-12 Thread Kiran Lonikar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14209307#comment-14209307
 ] 

Kiran Lonikar commented on HIVE-7292:
-

Sorry, I have not looked at the code, but want to know how is the RDD 
structured? is it columnar? I am specifically interested for ORC, RC, Parquet 
files about how you preserve their columnar structure. RDD by nature is row 
wise and the SchemaRDD more specifically so.

The spark sql component uses SchemaRDD which is row wise. Just to be clear, I 
am not reporting any problems with this JIRA. I am interested to know the 
implementation.

I think columnar structure has its advantages and thats what hive vectorization 
did (https://issues.apache.org/jira/browse/HIVE-4160). The earlier SQL 
implementation shark also had some kind of columnar structure. I am not sure 
this spark on hive is preserving it.


 Hive on Spark
 -

 Key: HIVE-7292
 URL: https://issues.apache.org/jira/browse/HIVE-7292
 Project: Hive
  Issue Type: Improvement
  Components: Spark
Reporter: Xuefu Zhang
Assignee: Xuefu Zhang
  Labels: Spark-M1, Spark-M2, Spark-M3, Spark-M4, Spark-M5
 Attachments: Hive-on-Spark.pdf


 Spark as an open-source data analytics cluster computing framework has gained 
 significant momentum recently. Many Hive users already have Spark installed 
 as their computing backbone. To take advantages of Hive, they still need to 
 have either MapReduce or Tez on their cluster. This initiative will provide 
 user a new alternative so that those user can consolidate their backend. 
 Secondly, providing such an alternative further increases Hive's adoption as 
 it exposes Spark users  to a viable, feature-rich de facto standard SQL tools 
 on Hadoop.
 Finally, allowing Hive to run on Spark also has performance benefits. Hive 
 queries, especially those involving multiple reducer stages, will run faster, 
 thus improving user experience as Tez does.
 This is an umbrella JIRA which will cover many coming subtask. Design doc 
 will be attached here shortly, and will be on the wiki as well. Feedback from 
 the community is greatly appreciated!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-7292) Hive on Spark

2014-10-28 Thread Paulo Motta (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14186862#comment-14186862
 ] 

Paulo Motta commented on HIVE-7292:
---

Is the branch already usable in production?

 Hive on Spark
 -

 Key: HIVE-7292
 URL: https://issues.apache.org/jira/browse/HIVE-7292
 Project: Hive
  Issue Type: Improvement
  Components: Spark
Reporter: Xuefu Zhang
Assignee: Xuefu Zhang
  Labels: Spark-M1, Spark-M2, Spark-M3, Spark-M4, Spark-M5
 Attachments: Hive-on-Spark.pdf


 Spark as an open-source data analytics cluster computing framework has gained 
 significant momentum recently. Many Hive users already have Spark installed 
 as their computing backbone. To take advantages of Hive, they still need to 
 have either MapReduce or Tez on their cluster. This initiative will provide 
 user a new alternative so that those user can consolidate their backend. 
 Secondly, providing such an alternative further increases Hive's adoption as 
 it exposes Spark users  to a viable, feature-rich de facto standard SQL tools 
 on Hadoop.
 Finally, allowing Hive to run on Spark also has performance benefits. Hive 
 queries, especially those involving multiple reducer stages, will run faster, 
 thus improving user experience as Tez does.
 This is an umbrella JIRA which will cover many coming subtask. Design doc 
 will be attached here shortly, and will be on the wiki as well. Feedback from 
 the community is greatly appreciated!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-7292) Hive on Spark

2014-10-28 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14186891#comment-14186891
 ] 

Xuefu Zhang commented on HIVE-7292:
---

[~pauloricardomg], thanks for your interest. I think the branch is ready for 
propective users to try out, but I'd recommend for production you wait for a 
formal release.

 Hive on Spark
 -

 Key: HIVE-7292
 URL: https://issues.apache.org/jira/browse/HIVE-7292
 Project: Hive
  Issue Type: Improvement
  Components: Spark
Reporter: Xuefu Zhang
Assignee: Xuefu Zhang
  Labels: Spark-M1, Spark-M2, Spark-M3, Spark-M4, Spark-M5
 Attachments: Hive-on-Spark.pdf


 Spark as an open-source data analytics cluster computing framework has gained 
 significant momentum recently. Many Hive users already have Spark installed 
 as their computing backbone. To take advantages of Hive, they still need to 
 have either MapReduce or Tez on their cluster. This initiative will provide 
 user a new alternative so that those user can consolidate their backend. 
 Secondly, providing such an alternative further increases Hive's adoption as 
 it exposes Spark users  to a viable, feature-rich de facto standard SQL tools 
 on Hadoop.
 Finally, allowing Hive to run on Spark also has performance benefits. Hive 
 queries, especially those involving multiple reducer stages, will run faster, 
 thus improving user experience as Tez does.
 This is an umbrella JIRA which will cover many coming subtask. Design doc 
 will be attached here shortly, and will be on the wiki as well. Feedback from 
 the community is greatly appreciated!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-7292) Hive on Spark

2014-07-22 Thread wangmeng (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14070044#comment-14070044
 ] 

wangmeng commented on HIVE-7292:


This is a very valuable project!

 Hive on Spark
 -

 Key: HIVE-7292
 URL: https://issues.apache.org/jira/browse/HIVE-7292
 Project: Hive
  Issue Type: Improvement
  Components: Spark
Reporter: Xuefu Zhang
Assignee: Xuefu Zhang
 Attachments: Hive-on-Spark.pdf


 Spark as an open-source data analytics cluster computing framework has gained 
 significant momentum recently. Many Hive users already have Spark installed 
 as their computing backbone. To take advantages of Hive, they still need to 
 have either MapReduce or Tez on their cluster. This initiative will provide 
 user a new alternative so that those user can consolidate their backend. 
 Secondly, providing such an alternative further increases Hive's adoption as 
 it exposes Spark users  to a viable, feature-rich de facto standard SQL tools 
 on Hadoop.
 Finally, allowing Hive to run on Spark also has performance benefits. Hive 
 queries, especially those involving multiple reducer stages, will run faster, 
 thus improving user experience as Tez does.
 This is an umbrella JIRA which will cover many coming subtask. Design doc 
 will be attached here shortly, and will be on the wiki as well. Feedback from 
 the community is greatly appreciated!



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7292) Hive on Spark

2014-07-01 Thread niraj rai (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14048936#comment-14048936
 ] 

niraj rai commented on HIVE-7292:
-

I am in OOO, so, the replying to the email might get delayed. Please reach out 
to me at (408) 799-8605 if you need something urgent.
Regards
Niraj



 Hive on Spark
 -

 Key: HIVE-7292
 URL: https://issues.apache.org/jira/browse/HIVE-7292
 Project: Hive
  Issue Type: Improvement
Reporter: Xuefu Zhang
Assignee: Xuefu Zhang
 Attachments: Hive-on-Spark.pdf


 Spark as an open-source data analytics cluster computing framework has gained 
 significant momentum recently. Many Hive users already have Spark installed 
 as their computing backbone. To take advantages of Hive, they still need to 
 have either MapReduce or Tez on their cluster. This initiative will provide 
 user a new alternative so that those user can consolidate their backend. 
 Secondly, providing such an alternative further increases Hive's adoption as 
 it exposes Spark users  to a viable, feature-rich de facto standard SQL tools 
 on Hadoop.
 Finally, allowing Hive to run on Spark also has performance benefits. Hive 
 queries, especially those involving multiple reducer stages, will run faster, 
 thus improving user experience as Tez does.
 This is an umbrella JIRA which will cover many coming subtask. Design doc 
 will be attached here shortly, and will be on the wiki as well. Feedback from 
 the community is greatly appreciated!



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7292) Hive on Spark

2014-07-01 Thread niraj rai (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14049138#comment-14049138
 ] 

niraj rai commented on HIVE-7292:
-

I am in OOO, so, the replying to the email might get delayed.


 Hive on Spark
 -

 Key: HIVE-7292
 URL: https://issues.apache.org/jira/browse/HIVE-7292
 Project: Hive
  Issue Type: Improvement
Reporter: Xuefu Zhang
Assignee: Xuefu Zhang
 Attachments: Hive-on-Spark.pdf


 Spark as an open-source data analytics cluster computing framework has gained 
 significant momentum recently. Many Hive users already have Spark installed 
 as their computing backbone. To take advantages of Hive, they still need to 
 have either MapReduce or Tez on their cluster. This initiative will provide 
 user a new alternative so that those user can consolidate their backend. 
 Secondly, providing such an alternative further increases Hive's adoption as 
 it exposes Spark users  to a viable, feature-rich de facto standard SQL tools 
 on Hadoop.
 Finally, allowing Hive to run on Spark also has performance benefits. Hive 
 queries, especially those involving multiple reducer stages, will run faster, 
 thus improving user experience as Tez does.
 This is an umbrella JIRA which will cover many coming subtask. Design doc 
 will be attached here shortly, and will be on the wiki as well. Feedback from 
 the community is greatly appreciated!



--
This message was sent by Atlassian JIRA
(v6.2#6252)