[jira] [Commented] (KYLIN-1082) Hive dependencies should be add to tmpjars
[ https://issues.apache.org/jira/browse/KYLIN-1082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15170837#comment-15170837 ] Zhong Yanghong commented on KYLIN-1082: --- Thanks Shaofeng. The issue is related to jar patterns matching. In sandbox, for hive-exec and hive-metastore, there will be related soft links without version information. While for some platforms, there are no such kind of soft links. With Mahong's help, the matching patterns have been corrected. Thanks for helping me applying the patch "fix_auto_hive_tmpjars_1_x_staging.patch". > Hive dependencies should be add to tmpjars > -- > > Key: KYLIN-1082 > URL: https://issues.apache.org/jira/browse/KYLIN-1082 > Project: Kylin > Issue Type: Bug > Components: Environment , Job Engine >Reporter: liyang >Assignee: Zhong Yanghong > Labels: newbie > Fix For: v2.1, v1.3 > > Attachments: auto_hive_tmpjars_1_x_staging.patch, > auto_hive_tmpjars_2_x_staging.patch, fix_auto_hive_tmpjars_1_x_staging.patch > > > Currently kylin assume all data nodes have hive deployment at exact same FS > location. However, a better position is to think hive as a client side app. > Then we need to ship hive jar with MR job every time. > This make deploy kylin a lot easier in cluster that does not have hive on all > data nodes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KYLIN-1082) Hive dependencies should be add to tmpjars
[ https://issues.apache.org/jira/browse/KYLIN-1082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15170836#comment-15170836 ] Shaofeng SHI commented on KYLIN-1082: - It works now, thanks yanghong! > Hive dependencies should be add to tmpjars > -- > > Key: KYLIN-1082 > URL: https://issues.apache.org/jira/browse/KYLIN-1082 > Project: Kylin > Issue Type: Bug > Components: Environment , Job Engine >Reporter: liyang >Assignee: Zhong Yanghong > Labels: newbie > Fix For: v2.1, v1.3 > > Attachments: auto_hive_tmpjars_1_x_staging.patch, > auto_hive_tmpjars_2_x_staging.patch, fix_auto_hive_tmpjars_1_x_staging.patch > > > Currently kylin assume all data nodes have hive deployment at exact same FS > location. However, a better position is to think hive as a client side app. > Then we need to ship hive jar with MR job every time. > This make deploy kylin a lot easier in cluster that does not have hive on all > data nodes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KYLIN-1082) Hive dependencies should be add to tmpjars
[ https://issues.apache.org/jira/browse/KYLIN-1082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15170802#comment-15170802 ] Zhong Yanghong commented on KYLIN-1082: --- Hi, Shaofeng, previously this issue has been fixed. And the related patch has been applied to 2.x-staging. However, I forget to apply it to 1.x-staging. Sorry for this. A related patch will be uploaded soon. > Hive dependencies should be add to tmpjars > -- > > Key: KYLIN-1082 > URL: https://issues.apache.org/jira/browse/KYLIN-1082 > Project: Kylin > Issue Type: Bug > Components: Environment , Job Engine >Reporter: liyang >Assignee: Zhong Yanghong > Labels: newbie > Fix For: v2.1, v1.3 > > Attachments: auto_hive_tmpjars_1_x_staging.patch, > auto_hive_tmpjars_2_x_staging.patch > > > Currently kylin assume all data nodes have hive deployment at exact same FS > location. However, a better position is to think hive as a client side app. > Then we need to ship hive jar with MR job every time. > This make deploy kylin a lot easier in cluster that does not have hive on all > data nodes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KYLIN-1082) Hive dependencies should be add to tmpjars
[ https://issues.apache.org/jira/browse/KYLIN-1082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15120806#comment-15120806 ] wangxianbin commented on KYLIN-1082: ok, I will open another ticket for hbase, thanks yanghong. > Hive dependencies should be add to tmpjars > -- > > Key: KYLIN-1082 > URL: https://issues.apache.org/jira/browse/KYLIN-1082 > Project: Kylin > Issue Type: Bug > Components: Environment , Job Engine >Reporter: liyang >Assignee: Zhong Yanghong > Labels: newbie > Fix For: v2.1, v1.3 > > Attachments: auto_hive_tmpjars_1_x_staging.patch, > auto_hive_tmpjars_2_x_staging.patch > > > Currently kylin assume all data nodes have hive deployment at exact same FS > location. However, a better position is to think hive as a client side app. > Then we need to ship hive jar with MR job every time. > This make deploy kylin a lot easier in cluster that does not have hive on all > data nodes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KYLIN-1082) Hive dependencies should be add to tmpjars
[ https://issues.apache.org/jira/browse/KYLIN-1082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15120798#comment-15120798 ] Zhong Yanghong commented on KYLIN-1082: --- Sorry, I made a mistake. You can open another jira for hbase if you needed. > Hive dependencies should be add to tmpjars > -- > > Key: KYLIN-1082 > URL: https://issues.apache.org/jira/browse/KYLIN-1082 > Project: Kylin > Issue Type: Bug > Components: Environment , Job Engine >Reporter: liyang >Assignee: Zhong Yanghong > Labels: newbie > Fix For: v2.1, v1.3 > > Attachments: auto_hive_tmpjars_1_x_staging.patch, > auto_hive_tmpjars_2_x_staging.patch > > > Currently kylin assume all data nodes have hive deployment at exact same FS > location. However, a better position is to think hive as a client side app. > Then we need to ship hive jar with MR job every time. > This make deploy kylin a lot easier in cluster that does not have hive on all > data nodes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KYLIN-1082) Hive dependencies should be add to tmpjars
[ https://issues.apache.org/jira/browse/KYLIN-1082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15119280#comment-15119280 ] Zhong Yanghong commented on KYLIN-1082: --- Maybe the previous clause is not good. For example, in the Class "CubeHFileJob", we imported some classes related to hbase, import org.apache.hadoop.hbase.HBaseConfiguration; import org.apache.hadoop.hbase.client.HTable; import org.apache.hadoop.hbase.mapreduce.HFileOutputFormat; import org.apache.hadoop.hbase.mapreduce.KeyValueSortReducer; They are responsible for convert Cuboid to HBase. However, they have been included in the hadoop-common jars. > Hive dependencies should be add to tmpjars > -- > > Key: KYLIN-1082 > URL: https://issues.apache.org/jira/browse/KYLIN-1082 > Project: Kylin > Issue Type: Bug > Components: Environment , Job Engine >Reporter: liyang >Assignee: Zhong Yanghong > Labels: newbie > Fix For: v2.1, v1.3 > > Attachments: auto_hive_tmpjars_1_x_staging.patch, > auto_hive_tmpjars_2_x_staging.patch > > > Currently kylin assume all data nodes have hive deployment at exact same FS > location. However, a better position is to think hive as a client side app. > Then we need to ship hive jar with MR job every time. > This make deploy kylin a lot easier in cluster that does not have hive on all > data nodes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KYLIN-1082) Hive dependencies should be add to tmpjars
[ https://issues.apache.org/jira/browse/KYLIN-1082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15119271#comment-15119271 ] wangxianbin commented on KYLIN-1082: hi yanghong, I guess I miss something here, what do you mean by "hbase related jars have been included in hadoop", can you be more specific, thank for your patience. > Hive dependencies should be add to tmpjars > -- > > Key: KYLIN-1082 > URL: https://issues.apache.org/jira/browse/KYLIN-1082 > Project: Kylin > Issue Type: Bug > Components: Environment , Job Engine >Reporter: liyang >Assignee: Zhong Yanghong > Labels: newbie > Fix For: v2.1, v1.3 > > Attachments: auto_hive_tmpjars_1_x_staging.patch, > auto_hive_tmpjars_2_x_staging.patch > > > Currently kylin assume all data nodes have hive deployment at exact same FS > location. However, a better position is to think hive as a client side app. > Then we need to ship hive jar with MR job every time. > This make deploy kylin a lot easier in cluster that does not have hive on all > data nodes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KYLIN-1082) Hive dependencies should be add to tmpjars
[ https://issues.apache.org/jira/browse/KYLIN-1082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15119095#comment-15119095 ] Zhong Yanghong commented on KYLIN-1082: --- Yes, we don't need that, for the hbase related jars have been included in hadoop. > Hive dependencies should be add to tmpjars > -- > > Key: KYLIN-1082 > URL: https://issues.apache.org/jira/browse/KYLIN-1082 > Project: Kylin > Issue Type: Bug > Components: Environment , Job Engine >Reporter: liyang >Assignee: Zhong Yanghong > Labels: newbie > Fix For: v2.1, v1.3 > > Attachments: auto_hive_tmpjars_1_x_staging.patch, > auto_hive_tmpjars_2_x_staging.patch > > > Currently kylin assume all data nodes have hive deployment at exact same FS > location. However, a better position is to think hive as a client side app. > Then we need to ship hive jar with MR job every time. > This make deploy kylin a lot easier in cluster that does not have hive on all > data nodes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KYLIN-1082) Hive dependencies should be add to tmpjars
[ https://issues.apache.org/jira/browse/KYLIN-1082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15118819#comment-15118819 ] wangxianbin commented on KYLIN-1082: hi, yanghong, how could we don't need this for hbase? like in ConvertCuboidToHfileStep, it's a mapreduce job and it will access hbase api. > Hive dependencies should be add to tmpjars > -- > > Key: KYLIN-1082 > URL: https://issues.apache.org/jira/browse/KYLIN-1082 > Project: Kylin > Issue Type: Bug > Components: Environment , Job Engine >Reporter: liyang >Assignee: Zhong Yanghong > Labels: newbie > Fix For: v2.1, v1.3 > > Attachments: auto_hive_tmpjars_1_x_staging.patch, > auto_hive_tmpjars_2_x_staging.patch > > > Currently kylin assume all data nodes have hive deployment at exact same FS > location. However, a better position is to think hive as a client side app. > Then we need to ship hive jar with MR job every time. > This make deploy kylin a lot easier in cluster that does not have hive on all > data nodes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KYLIN-1082) Hive dependencies should be add to tmpjars
[ https://issues.apache.org/jira/browse/KYLIN-1082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15116484#comment-15116484 ] wangxianbin commented on KYLIN-1082: agree, thank you, yanghong > Hive dependencies should be add to tmpjars > -- > > Key: KYLIN-1082 > URL: https://issues.apache.org/jira/browse/KYLIN-1082 > Project: Kylin > Issue Type: Bug > Components: Environment , Job Engine >Reporter: liyang >Assignee: Zhong Yanghong > Labels: newbie > Fix For: v2.1, v1.3 > > Attachments: auto_hive_tmpjars_1_x_staging.patch, > auto_hive_tmpjars_2_x_staging.patch > > > Currently kylin assume all data nodes have hive deployment at exact same FS > location. However, a better position is to think hive as a client side app. > Then we need to ship hive jar with MR job every time. > This make deploy kylin a lot easier in cluster that does not have hive on all > data nodes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KYLIN-1082) Hive dependencies should be add to tmpjars
[ https://issues.apache.org/jira/browse/KYLIN-1082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15115043#comment-15115043 ] Zhong Yanghong commented on KYLIN-1082: --- Here, for the first one, we don't need this for hbase dependencies. > Hive dependencies should be add to tmpjars > -- > > Key: KYLIN-1082 > URL: https://issues.apache.org/jira/browse/KYLIN-1082 > Project: Kylin > Issue Type: Bug > Components: Environment , Job Engine >Reporter: liyang >Assignee: Zhong Yanghong > Labels: newbie > Fix For: v2.1, v1.3 > > Attachments: auto_hive_tmpjars_1_x_staging.patch, > auto_hive_tmpjars_2_x_staging.patch > > > Currently kylin assume all data nodes have hive deployment at exact same FS > location. However, a better position is to think hive as a client side app. > Then we need to ship hive jar with MR job every time. > This make deploy kylin a lot easier in cluster that does not have hive on all > data nodes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KYLIN-1082) Hive dependencies should be add to tmpjars
[ https://issues.apache.org/jira/browse/KYLIN-1082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15115041#comment-15115041 ] Zhong Yanghong commented on KYLIN-1082: --- We have agreed on the current implementation and don't provide the trigger so as to make users' life simple. Compared to data to be processed, size of the jars uploaded is very small and the cost can be ignored. > Hive dependencies should be add to tmpjars > -- > > Key: KYLIN-1082 > URL: https://issues.apache.org/jira/browse/KYLIN-1082 > Project: Kylin > Issue Type: Bug > Components: Environment , Job Engine >Reporter: liyang >Assignee: Zhong Yanghong > Labels: newbie > Fix For: v2.1, v1.3 > > Attachments: auto_hive_tmpjars_1_x_staging.patch, > auto_hive_tmpjars_2_x_staging.patch > > > Currently kylin assume all data nodes have hive deployment at exact same FS > location. However, a better position is to think hive as a client side app. > Then we need to ship hive jar with MR job every time. > This make deploy kylin a lot easier in cluster that does not have hive on all > data nodes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KYLIN-1082) Hive dependencies should be add to tmpjars
[ https://issues.apache.org/jira/browse/KYLIN-1082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15114975#comment-15114975 ] wangxianbin commented on KYLIN-1082: hey, guys! do we need to open a ticket for hbase depencies, or we end it here? > Hive dependencies should be add to tmpjars > -- > > Key: KYLIN-1082 > URL: https://issues.apache.org/jira/browse/KYLIN-1082 > Project: Kylin > Issue Type: Bug > Components: Environment , Job Engine >Reporter: liyang >Assignee: Zhong Yanghong > Labels: newbie > Fix For: v2.1, v1.3 > > Attachments: auto_hive_tmpjars_1_x_staging.patch, > auto_hive_tmpjars_2_x_staging.patch > > > Currently kylin assume all data nodes have hive deployment at exact same FS > location. However, a better position is to think hive as a client side app. > Then we need to ship hive jar with MR job every time. > This make deploy kylin a lot easier in cluster that does not have hive on all > data nodes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KYLIN-1082) Hive dependencies should be add to tmpjars
[ https://issues.apache.org/jira/browse/KYLIN-1082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15114973#comment-15114973 ] wangxianbin commented on KYLIN-1082: I see your point, and yes, your implementation is clear and nice, there is just one flaw, it will broadcast hive dependencies anyway, however, in KYLIN-1021 agreement, for some case like cluster which have consistent binary and configs across all nodes, we do not need to broadcast hive dependencies, so we can add one flag to disable it, check KYLIN-1021. > Hive dependencies should be add to tmpjars > -- > > Key: KYLIN-1082 > URL: https://issues.apache.org/jira/browse/KYLIN-1082 > Project: Kylin > Issue Type: Bug > Components: Environment , Job Engine >Reporter: liyang >Assignee: Zhong Yanghong > Labels: newbie > Fix For: v2.1, v1.3 > > Attachments: auto_hive_tmpjars_1_x_staging.patch, > auto_hive_tmpjars_2_x_staging.patch > > > Currently kylin assume all data nodes have hive deployment at exact same FS > location. However, a better position is to think hive as a client side app. > Then we need to ship hive jar with MR job every time. > This make deploy kylin a lot easier in cluster that does not have hive on all > data nodes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KYLIN-1082) Hive dependencies should be add to tmpjars
[ https://issues.apache.org/jira/browse/KYLIN-1082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15114914#comment-15114914 ] wangxianbin commented on KYLIN-1082: I guess follow 2.x will be better > Hive dependencies should be add to tmpjars > -- > > Key: KYLIN-1082 > URL: https://issues.apache.org/jira/browse/KYLIN-1082 > Project: Kylin > Issue Type: Bug > Components: Environment , Job Engine >Reporter: liyang >Assignee: Zhong Yanghong > Labels: newbie > Fix For: v2.1, v1.3 > > Attachments: auto_hive_tmpjars_1_x_staging.patch, > auto_hive_tmpjars_2_x_staging.patch > > > Currently kylin assume all data nodes have hive deployment at exact same FS > location. However, a better position is to think hive as a client side app. > Then we need to ship hive jar with MR job every time. > This make deploy kylin a lot easier in cluster that does not have hive on all > data nodes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KYLIN-1082) Hive dependencies should be add to tmpjars
[ https://issues.apache.org/jira/browse/KYLIN-1082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15114912#comment-15114912 ] Zhong Yanghong commented on KYLIN-1082: --- I'll keep it the same as 1.x. Thank you very much for your help. > Hive dependencies should be add to tmpjars > -- > > Key: KYLIN-1082 > URL: https://issues.apache.org/jira/browse/KYLIN-1082 > Project: Kylin > Issue Type: Bug > Components: Environment , Job Engine >Reporter: liyang >Assignee: Zhong Yanghong > Labels: newbie > Fix For: v2.1, v1.3 > > Attachments: auto_hive_tmpjars_1_x_staging.patch, > auto_hive_tmpjars_2_x_staging.patch > > > Currently kylin assume all data nodes have hive deployment at exact same FS > location. However, a better position is to think hive as a client side app. > Then we need to ship hive jar with MR job every time. > This make deploy kylin a lot easier in cluster that does not have hive on all > data nodes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KYLIN-1082) Hive dependencies should be add to tmpjars
[ https://issues.apache.org/jira/browse/KYLIN-1082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15114908#comment-15114908 ] Zhong Yanghong commented on KYLIN-1082: --- Thank you very much. Finally I got it. Then which solution you think is better? Add the replacement code in method "getDefaultMapRedClasspath" like 2.x or do the replacement like 1.x > Hive dependencies should be add to tmpjars > -- > > Key: KYLIN-1082 > URL: https://issues.apache.org/jira/browse/KYLIN-1082 > Project: Kylin > Issue Type: Bug > Components: Environment , Job Engine >Reporter: liyang >Assignee: Zhong Yanghong > Labels: newbie > Fix For: v2.1, v1.3 > > Attachments: auto_hive_tmpjars_1_x_staging.patch, > auto_hive_tmpjars_2_x_staging.patch > > > Currently kylin assume all data nodes have hive deployment at exact same FS > location. However, a better position is to think hive as a client side app. > Then we need to ship hive jar with MR job every time. > This make deploy kylin a lot easier in cluster that does not have hive on all > data nodes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KYLIN-1082) Hive dependencies should be add to tmpjars
[ https://issues.apache.org/jira/browse/KYLIN-1082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15114905#comment-15114905 ] wangxianbin commented on KYLIN-1082: and 2.x add the replacement code into getDefaultMapRedClasspath, you did not follow that, which may cause unexpected error > Hive dependencies should be add to tmpjars > -- > > Key: KYLIN-1082 > URL: https://issues.apache.org/jira/browse/KYLIN-1082 > Project: Kylin > Issue Type: Bug > Components: Environment , Job Engine >Reporter: liyang >Assignee: Zhong Yanghong > Labels: newbie > Fix For: v2.1, v1.3 > > Attachments: auto_hive_tmpjars_1_x_staging.patch, > auto_hive_tmpjars_2_x_staging.patch > > > Currently kylin assume all data nodes have hive deployment at exact same FS > location. However, a better position is to think hive as a client side app. > Then we need to ship hive jar with MR job every time. > This make deploy kylin a lot easier in cluster that does not have hive on all > data nodes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KYLIN-1082) Hive dependencies should be add to tmpjars
[ https://issues.apache.org/jira/browse/KYLIN-1082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15114902#comment-15114902 ] wangxianbin commented on KYLIN-1082: hi, yanghong, pls check your patch for 1.x, I guess you follow 2.x code to delete the replacement code for DefaultMapRedClasspath in 1.x, pls correct me. > Hive dependencies should be add to tmpjars > -- > > Key: KYLIN-1082 > URL: https://issues.apache.org/jira/browse/KYLIN-1082 > Project: Kylin > Issue Type: Bug > Components: Environment , Job Engine >Reporter: liyang >Assignee: Zhong Yanghong > Labels: newbie > Fix For: v2.1, v1.3 > > Attachments: auto_hive_tmpjars_1_x_staging.patch, > auto_hive_tmpjars_2_x_staging.patch > > > Currently kylin assume all data nodes have hive deployment at exact same FS > location. However, a better position is to think hive as a client side app. > Then we need to ship hive jar with MR job every time. > This make deploy kylin a lot easier in cluster that does not have hive on all > data nodes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KYLIN-1082) Hive dependencies should be add to tmpjars
[ https://issues.apache.org/jira/browse/KYLIN-1082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15114895#comment-15114895 ] Zhong Yanghong commented on KYLIN-1082: --- I'm sorry. Just change "kylin.job.mr.lib.dir" to "kylin.hive.dependency" > Hive dependencies should be add to tmpjars > -- > > Key: KYLIN-1082 > URL: https://issues.apache.org/jira/browse/KYLIN-1082 > Project: Kylin > Issue Type: Bug > Components: Environment , Job Engine >Reporter: liyang >Assignee: Zhong Yanghong > Labels: newbie > Fix For: v2.1, v1.3 > > Attachments: auto_hive_tmpjars_1_x_staging.patch, > auto_hive_tmpjars_2_x_staging.patch > > > Currently kylin assume all data nodes have hive deployment at exact same FS > location. However, a better position is to think hive as a client side app. > Then we need to ship hive jar with MR job every time. > This make deploy kylin a lot easier in cluster that does not have hive on all > data nodes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KYLIN-1082) Hive dependencies should be add to tmpjars
[ https://issues.apache.org/jira/browse/KYLIN-1082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15114888#comment-15114888 ] wangxianbin commented on KYLIN-1082: hi yanghong, what do you exactly mean by (One is appending the related jars to the property "kylin.job.mr.lib.dir") in your two ways? > Hive dependencies should be add to tmpjars > -- > > Key: KYLIN-1082 > URL: https://issues.apache.org/jira/browse/KYLIN-1082 > Project: Kylin > Issue Type: Bug > Components: Environment , Job Engine >Reporter: liyang >Assignee: Zhong Yanghong > Labels: newbie > Fix For: v2.1, v1.3 > > Attachments: auto_hive_tmpjars_1_x_staging.patch, > auto_hive_tmpjars_2_x_staging.patch > > > Currently kylin assume all data nodes have hive deployment at exact same FS > location. However, a better position is to think hive as a client side app. > Then we need to ship hive jar with MR job every time. > This make deploy kylin a lot easier in cluster that does not have hive on all > data nodes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KYLIN-1082) Hive dependencies should be add to tmpjars
[ https://issues.apache.org/jira/browse/KYLIN-1082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15114875#comment-15114875 ] Zhong Yanghong commented on KYLIN-1082: --- For the first one, when convert cuboid to hfile, it may use hbase related jars. But I haven't tested it. Later I will tell you whether it's needed or not. For the second, I think I don't delete the replacement. > Hive dependencies should be add to tmpjars > -- > > Key: KYLIN-1082 > URL: https://issues.apache.org/jira/browse/KYLIN-1082 > Project: Kylin > Issue Type: Bug > Components: Environment , Job Engine >Reporter: liyang >Assignee: Zhong Yanghong > Labels: newbie > Fix For: v2.1, v1.3 > > Attachments: auto_hive_tmpjars_1_x_staging.patch, > auto_hive_tmpjars_2_x_staging.patch > > > Currently kylin assume all data nodes have hive deployment at exact same FS > location. However, a better position is to think hive as a client side app. > Then we need to ship hive jar with MR job every time. > This make deploy kylin a lot easier in cluster that does not have hive on all > data nodes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KYLIN-1082) Hive dependencies should be add to tmpjars
[ https://issues.apache.org/jira/browse/KYLIN-1082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15114872#comment-15114872 ] wangxianbin commented on KYLIN-1082: I think your implementation is great, just two questions, one is do we need this for hbase dependencies? another is why delete "colon-comma" replace code for DefaultMapRedClasspath? > Hive dependencies should be add to tmpjars > -- > > Key: KYLIN-1082 > URL: https://issues.apache.org/jira/browse/KYLIN-1082 > Project: Kylin > Issue Type: Bug > Components: Environment , Job Engine >Reporter: liyang >Assignee: Zhong Yanghong > Labels: newbie > Fix For: v2.1, v1.3 > > Attachments: auto_hive_tmpjars_1_x_staging.patch, > auto_hive_tmpjars_2_x_staging.patch > > > Currently kylin assume all data nodes have hive deployment at exact same FS > location. However, a better position is to think hive as a client side app. > Then we need to ship hive jar with MR job every time. > This make deploy kylin a lot easier in cluster that does not have hive on all > data nodes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KYLIN-1082) Hive dependencies should be add to tmpjars
[ https://issues.apache.org/jira/browse/KYLIN-1082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15114873#comment-15114873 ] Zhong Yanghong commented on KYLIN-1082: --- Both ways appends related jars to the property "tmpjars". > Hive dependencies should be add to tmpjars > -- > > Key: KYLIN-1082 > URL: https://issues.apache.org/jira/browse/KYLIN-1082 > Project: Kylin > Issue Type: Bug > Components: Environment , Job Engine >Reporter: liyang >Assignee: Zhong Yanghong > Labels: newbie > Fix For: v2.1, v1.3 > > Attachments: auto_hive_tmpjars_1_x_staging.patch, > auto_hive_tmpjars_2_x_staging.patch > > > Currently kylin assume all data nodes have hive deployment at exact same FS > location. However, a better position is to think hive as a client side app. > Then we need to ship hive jar with MR job every time. > This make deploy kylin a lot easier in cluster that does not have hive on all > data nodes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KYLIN-1082) Hive dependencies should be add to tmpjars
[ https://issues.apache.org/jira/browse/KYLIN-1082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15114871#comment-15114871 ] Zhong Yanghong commented on KYLIN-1082: --- In my implementation, there are two ways to distribute dependent jars to datanode. One is appending the related jars to the property "kylin.job.mr.lib.dir", the other is set the property "kylin.job.mr.lib.dir", and copy the related jars into this specified directory. Since for every machine running kylin, the Hive is supposed to be installed. Then there is a shell script called "find-hive-dependency.sh" will find the hive dependencies and set the property "kylin.hive.dependency". To avoid uploading too many useless jars which kylin jobs will not use, there is a filter inside "AbstractHadoopJob.java" to filter out only the jars to be used. > Hive dependencies should be add to tmpjars > -- > > Key: KYLIN-1082 > URL: https://issues.apache.org/jira/browse/KYLIN-1082 > Project: Kylin > Issue Type: Bug > Components: Environment , Job Engine >Reporter: liyang >Assignee: Zhong Yanghong > Labels: newbie > Fix For: v2.1, v1.3 > > Attachments: auto_hive_tmpjars_1_x_staging.patch, > auto_hive_tmpjars_2_x_staging.patch > > > Currently kylin assume all data nodes have hive deployment at exact same FS > location. However, a better position is to think hive as a client side app. > Then we need to ship hive jar with MR job every time. > This make deploy kylin a lot easier in cluster that does not have hive on all > data nodes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KYLIN-1082) Hive dependencies should be add to tmpjars
[ https://issues.apache.org/jira/browse/KYLIN-1082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15114866#comment-15114866 ] wangxianbin commented on KYLIN-1082: hi yanghong, implementation seem like a little different from original agreement in KYLIN-1021, which is offer two way of deploying kylin env, and it can distinguish it by the property "kylin.job.mr.lib.dir", however, your implementation add hive dependencies into tmpjars arbitrarily > Hive dependencies should be add to tmpjars > -- > > Key: KYLIN-1082 > URL: https://issues.apache.org/jira/browse/KYLIN-1082 > Project: Kylin > Issue Type: Bug > Components: Environment , Job Engine >Reporter: liyang >Assignee: Zhong Yanghong > Labels: newbie > Fix For: v2.1, v1.3 > > Attachments: auto_hive_tmpjars_1_x_staging.patch, > auto_hive_tmpjars_2_x_staging.patch > > > Currently kylin assume all data nodes have hive deployment at exact same FS > location. However, a better position is to think hive as a client side app. > Then we need to ship hive jar with MR job every time. > This make deploy kylin a lot easier in cluster that does not have hive on all > data nodes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KYLIN-1082) Hive dependencies should be add to tmpjars
[ https://issues.apache.org/jira/browse/KYLIN-1082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15114853#comment-15114853 ] wangxianbin commented on KYLIN-1082: sorry for getting involved so late, I have been quite busy in last two weeks > Hive dependencies should be add to tmpjars > -- > > Key: KYLIN-1082 > URL: https://issues.apache.org/jira/browse/KYLIN-1082 > Project: Kylin > Issue Type: Bug > Components: Environment , Job Engine >Reporter: liyang >Assignee: Zhong Yanghong > Labels: newbie > Fix For: v2.1, v1.3 > > Attachments: auto_hive_tmpjars_1_x_staging.patch, > auto_hive_tmpjars_2_x_staging.patch > > > Currently kylin assume all data nodes have hive deployment at exact same FS > location. However, a better position is to think hive as a client side app. > Then we need to ship hive jar with MR job every time. > This make deploy kylin a lot easier in cluster that does not have hive on all > data nodes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KYLIN-1082) Hive dependencies should be add to tmpjars
[ https://issues.apache.org/jira/browse/KYLIN-1082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15112120#comment-15112120 ] Zhong Yanghong commented on KYLIN-1082: --- After checking, only three jars need to be distributed to datanode, "hive-exec.jar, hive-metastore.jar, and hive-hcatalog-core.jar". > Hive dependencies should be add to tmpjars > -- > > Key: KYLIN-1082 > URL: https://issues.apache.org/jira/browse/KYLIN-1082 > Project: Kylin > Issue Type: Bug >Reporter: liyang >Assignee: Zhong Yanghong > Labels: newbie > Attachments: > 0001-For-achieving-automatically-upload-hive-related-jars.patch > > > Currently kylin assume all data nodes have hive deployment at exact same FS > location. However, a better position is to think hive as a client side app. > Then we need to ship hive jar with MR job every time. > This make deploy kylin a lot easier in cluster that does not have hive on all > data nodes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KYLIN-1082) Hive dependencies should be add to tmpjars
[ https://issues.apache.org/jira/browse/KYLIN-1082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15110855#comment-15110855 ] Zhong Yanghong commented on KYLIN-1082: --- Thank you for you guys. Finally my initial version is done. I mainly changed the file engine-mr/src/main/java/org/apache/kylin/engine/mr/common/AbstractHadoopJob.java. The additional function is to filter the system level property called "kylin.hive.dependency" to get interested hive dependency jar names which will later be added to the hadoop's special property called "tmpjars" whose related jars will be uploaded to each datanode running mapreduce work. To set the value for "kylin.hive.dependency", for each platform there is a way. For development and testing machines which don't install hive and run the "DebugTomcat.java" to start KYLIN, just add the System.setProperty("kylin.hive.dependency",""). While for the sandbox which have install hive and run kylin.sh to start KYLIN, the shell script find-hive-dependency.sh run in kylin.sh will automatically set the property. To add additional helpful jars, there is another way. In the file "kylin.properties", we can set a property called "kylin.job.mr.lib.dir". Then AbstractHadoopJob.java will parse out all of the jars and files under this self-defined directory including the subdirectory and add them to "tmpjars". > Hive dependencies should be add to tmpjars > -- > > Key: KYLIN-1082 > URL: https://issues.apache.org/jira/browse/KYLIN-1082 > Project: Kylin > Issue Type: Bug >Reporter: liyang >Assignee: Zhong Yanghong > Labels: newbie > > Currently kylin assume all data nodes have hive deployment at exact same FS > location. However, a better position is to think hive as a client side app. > Then we need to ship hive jar with MR job every time. > This make deploy kylin a lot easier in cluster that does not have hive on all > data nodes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KYLIN-1082) Hive dependencies should be add to tmpjars
[ https://issues.apache.org/jira/browse/KYLIN-1082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15106154#comment-15106154 ] liyang commented on KYLIN-1082: --- Hi Xianbin, just realized I overlooked the history when asking Yanghong to help. Nevertheless, please help review Yanghong's patch when it's ready. The credit shall go to both of you. Thanks! > Hive dependencies should be add to tmpjars > -- > > Key: KYLIN-1082 > URL: https://issues.apache.org/jira/browse/KYLIN-1082 > Project: Kylin > Issue Type: Bug >Reporter: liyang >Assignee: Zhong Yanghong > Labels: newbie > > Currently kylin assume all data nodes have hive deployment at exact same FS > location. However, a better position is to think hive as a client side app. > Then we need to ship hive jar with MR job every time. > This make deploy kylin a lot easier in cluster that does not have hive on all > data nodes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KYLIN-1082) Hive dependencies should be add to tmpjars
[ https://issues.apache.org/jira/browse/KYLIN-1082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15093523#comment-15093523 ] wangxianbin commented on KYLIN-1082: OK, I think all we need to do is just merge new patch uploaded by fengyu, maybe a little modification. > Hive dependencies should be add to tmpjars > -- > > Key: KYLIN-1082 > URL: https://issues.apache.org/jira/browse/KYLIN-1082 > Project: Kylin > Issue Type: Bug >Reporter: liyang >Assignee: Zhong Yanghong > Labels: newbie > > Currently kylin assume all data nodes have hive deployment at exact same FS > location. However, a better position is to think hive as a client side app. > Then we need to ship hive jar with MR job every time. > This make deploy kylin a lot easier in cluster that does not have hive on all > data nodes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KYLIN-1082) Hive dependencies should be add to tmpjars
[ https://issues.apache.org/jira/browse/KYLIN-1082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15093507#comment-15093507 ] liyang commented on KYLIN-1082: --- Yes it's agreed that hive jar better get auto submitted. And that's why we have this JIRA. I've invited Zhong, Yanghong to work on this item. > Hive dependencies should be add to tmpjars > -- > > Key: KYLIN-1082 > URL: https://issues.apache.org/jira/browse/KYLIN-1082 > Project: Kylin > Issue Type: Bug >Reporter: liyang >Assignee: Zhong Yanghong > Labels: newbie > > Currently kylin assume all data nodes have hive deployment at exact same FS > location. However, a better position is to think hive as a client side app. > Then we need to ship hive jar with MR job every time. > This make deploy kylin a lot easier in cluster that does not have hive on all > data nodes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KYLIN-1082) Hive dependencies should be add to tmpjars
[ https://issues.apache.org/jira/browse/KYLIN-1082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15088945#comment-15088945 ] wangxianbin commented on KYLIN-1082: hi yang, after investigation, I found that your guys have merged fengyu patch for KYLIN-1021, except that users need to upload hive dependencies jar into hdfs manually and set "kylin.job.mr.lib.dir" in kylin.properties, and seem like you guys already have an agreement on this, however, there is another patch from fengyu in KYLIN-1021, in this patch, he make jar upload automatically, all you need to do is just enable "kylin.job.upload.jars.enabled" in kylin.properties, however, this patch is not merged into kylin, so your guys prefer upload jar automatically now?, that's why you create this ticket? > Hive dependencies should be add to tmpjars > -- > > Key: KYLIN-1082 > URL: https://issues.apache.org/jira/browse/KYLIN-1082 > Project: Kylin > Issue Type: Bug >Reporter: liyang >Assignee: liyang > Labels: newbie > > Currently kylin assume all data nodes have hive deployment at exact same FS > location. However, a better position is to think hive as a client side app. > Then we need to ship hive jar with MR job every time. > This make deploy kylin a lot easier in cluster that does not have hive on all > data nodes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KYLIN-1082) Hive dependencies should be add to tmpjars
[ https://issues.apache.org/jira/browse/KYLIN-1082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15084604#comment-15084604 ] wangxianbin commented on KYLIN-1082: hi,liyang! which version should I work on, 2.x-staging or 2.0-rc? any suggestion > Hive dependencies should be add to tmpjars > -- > > Key: KYLIN-1082 > URL: https://issues.apache.org/jira/browse/KYLIN-1082 > Project: Kylin > Issue Type: Bug >Reporter: liyang >Assignee: liyang > Labels: newbie > > Currently kylin assume all data nodes have hive deployment at exact same FS > location. However, a better position is to think hive as a client side app. > Then we need to ship hive jar with MR job every time. > This make deploy kylin a lot easier in cluster that does not have hive on all > data nodes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KYLIN-1082) Hive dependencies should be add to tmpjars
[ https://issues.apache.org/jira/browse/KYLIN-1082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15082763#comment-15082763 ] liyang commented on KYLIN-1082: --- Xianbin, just go ahead. You will be assignee once you are a commiter. :-) > Hive dependencies should be add to tmpjars > -- > > Key: KYLIN-1082 > URL: https://issues.apache.org/jira/browse/KYLIN-1082 > Project: Kylin > Issue Type: Bug >Reporter: liyang >Assignee: liyang > Labels: newbie > > Currently kylin assume all data nodes have hive deployment at exact same FS > location. However, a better position is to think hive as a client side app. > Then we need to ship hive jar with MR job every time. > This make deploy kylin a lot easier in cluster that does not have hive on all > data nodes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KYLIN-1082) Hive dependencies should be add to tmpjars
[ https://issues.apache.org/jira/browse/KYLIN-1082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15082501#comment-15082501 ] wangxianbin commented on KYLIN-1082: hi, liyang, can you assign this task to me? > Hive dependencies should be add to tmpjars > -- > > Key: KYLIN-1082 > URL: https://issues.apache.org/jira/browse/KYLIN-1082 > Project: Kylin > Issue Type: Bug >Reporter: liyang >Assignee: liyang > Labels: newbie > > Currently kylin assume all data nodes have hive deployment at exact same FS > location. However, a better position is to think hive as a client side app. > Then we need to ship hive jar with MR job every time. > This make deploy kylin a lot easier in cluster that does not have hive on all > data nodes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)