RE: Deserializing map column via JDBC (HIVE-1378)
> The simplest thing to do is to: > 1. Rename "useJSONforLazy" to "useDelimitedJSON"; > 2. Use "DelimitedJSONSerDe" when useDelimitedJSON = true; So, DelimitedJSONSerDe will need the same deserialization capability as LazySimpleSerDe? -Original Message- From: Zheng Shao [mailto:zs...@facebook.com] Sent: Thursday, September 02, 2010 7:19 PM To: Steven Wong; hive-dev@hadoop.apache.org Subject: RE: Deserializing map column via JDBC (HIVE-1378) Earlier there was no multi-level delimited format - the only way is first-level delimited, and then JSON. Some legacy scripts/apps have been written to work with that. Later we introduced multi-level delimited format, and made the hack to put them together. Zheng -Original Message- From: Steven Wong [mailto:sw...@netflix.com] Sent: Friday, September 03, 2010 10:17 AM To: Zheng Shao; hive-dev@hadoop.apache.org Subject: RE: Deserializing map column via JDBC (HIVE-1378) Why was/is useJSONforLazy needed? What's the historical background? -Original Message- From: Zheng Shao [mailto:zs...@facebook.com] Sent: Thursday, September 02, 2010 7:11 PM To: Steven Wong; hive-dev@hadoop.apache.org Subject: RE: Deserializing map column via JDBC (HIVE-1378) The simplest thing to do is to: 1. Rename "useJSONforLazy" to "useDelimitedJSON"; 2. Use "DelimitedJSONSerDe" when useDelimitedJSON = true; Zheng -Original Message- From: Steven Wong [mailto:sw...@netflix.com] Sent: Friday, September 03, 2010 10:05 AM To: Zheng Shao; hive-dev@hadoop.apache.org Subject: RE: Deserializing map column via JDBC (HIVE-1378) Zheng, In LazySimpleSerDe.initSerdeParams: String useJsonSerialize = tbl .getProperty(Constants.SERIALIZATION_USE_JSON_OBJECTS); serdeParams.jsonSerialize = (useJsonSerialize != null && useJsonSerialize .equalsIgnoreCase("true")); SERIALIZATION_USE_JSON_OBJECTS is set to true in PlanUtis.getTableDesc: // It is not a very clean way, and should be modified later - due to // compatiblity reasons, // user sees the results as json for custom scripts and has no way for // specifying that. // Right now, it is hard-coded in the code if (useJSONForLazy) { properties.setProperty(Constants.SERIALIZATION_USE_JSON_OBJECTS, "true"); } useJSONForLazy is true in the following 2 calls to PlanUtis.getTableDesc: SemanticAnalyzer.genScriptPlan -> PlanUtis.getTableDesc SemanticAnalyzer.genScriptPlan -> SemanticAnalyzer.getTableDescFromSerDe -> PlanUtis.getTableDesc What is it all about and how should we untangle it (ideally get rid of SERIALIZATION_USE_JSON_OBJECTS)? Thanks. Steven -Original Message- From: Zheng Shao [mailto:zs...@facebook.com] Sent: Wednesday, September 01, 2010 6:45 PM To: Steven Wong; hive-dev@hadoop.apache.org; John Sichi Cc: Jerome Boulon Subject: RE: Deserializing map column via JDBC (HIVE-1378) Hi Steven, As far as I remember, the only use case of JSON logic in LazySimpleSerDe is the FetchTask. Even if there are other cases, we should be able to catch it in unit tests. The potential risk is small enough, and the benefit of cleaning it up is pretty big - it makes the code much easier to understand. Thanks for getting to it Steven! I am very happy to see that this finally gets cleaned up! Zheng -Original Message- From: Steven Wong [mailto:sw...@netflix.com] Sent: Thursday, September 02, 2010 7:45 AM To: Zheng Shao; hive-dev@hadoop.apache.org; John Sichi Cc: Jerome Boulon Subject: RE: Deserializing map column via JDBC (HIVE-1378) Your suggestion is in line with my earlier proposal of fixing FetchTask. The only major difference is the moving of the JSON-related logic from LazySimpleSerDe to a new serde called DelimitedJSONSerDe. Is it safe to get rid of the JSON-related logic in LazySimpleSerDe? Sounds like you're implying that it is safe, but I'd like to confirm with you. I don't really know whether there are components other than FetchTask that rely on LazySimpleSerDe and its JSON capability (the useJSONSerialize flag doesn't have to be true for LazySimpleSerDe to use JSON). If it is safe, I am totally fine with introducing DelimitedJSONSerDe. Combining your suggestion and my proposal would look like: 0. Move JSON serialization logic from LazySimpleSerDe to a new serde called DelimitedJSONSerDe. 1. By default, hive.fetch.output.serde = DelimitedJSONSerDe. 2. When JDBC driver connects to Hive server, execute "set hive.fetch.output.serde = LazySimpleSerDe". 3. In Hive server: (a) If hive.fetch.output.serde == DelimitedJSONSerDe, FetchTask uses DelimitedJSONSerDe to maintain today's serialization behavior (tab for field delimiter tab, "NULL" for null sequence, JSON for non-primitives). (b) If hive.fetch.output.serde == LazySimpleSerDe, FetchTask uses LazySimpleSerDe with a schema to ctrl-delimit everything. 4. JDBC driver deserializes with LazySimpleSerDe instead of DynamicSerDe. Steven -Original Message- From:
RE: Deserializing map column via JDBC (HIVE-1378)
Why was/is useJSONforLazy needed? What's the historical background? -Original Message- From: Zheng Shao [mailto:zs...@facebook.com] Sent: Thursday, September 02, 2010 7:11 PM To: Steven Wong; hive-dev@hadoop.apache.org Subject: RE: Deserializing map column via JDBC (HIVE-1378) The simplest thing to do is to: 1. Rename "useJSONforLazy" to "useDelimitedJSON"; 2. Use "DelimitedJSONSerDe" when useDelimitedJSON = true; Zheng -Original Message- From: Steven Wong [mailto:sw...@netflix.com] Sent: Friday, September 03, 2010 10:05 AM To: Zheng Shao; hive-dev@hadoop.apache.org Subject: RE: Deserializing map column via JDBC (HIVE-1378) Zheng, In LazySimpleSerDe.initSerdeParams: String useJsonSerialize = tbl .getProperty(Constants.SERIALIZATION_USE_JSON_OBJECTS); serdeParams.jsonSerialize = (useJsonSerialize != null && useJsonSerialize .equalsIgnoreCase("true")); SERIALIZATION_USE_JSON_OBJECTS is set to true in PlanUtis.getTableDesc: // It is not a very clean way, and should be modified later - due to // compatiblity reasons, // user sees the results as json for custom scripts and has no way for // specifying that. // Right now, it is hard-coded in the code if (useJSONForLazy) { properties.setProperty(Constants.SERIALIZATION_USE_JSON_OBJECTS, "true"); } useJSONForLazy is true in the following 2 calls to PlanUtis.getTableDesc: SemanticAnalyzer.genScriptPlan -> PlanUtis.getTableDesc SemanticAnalyzer.genScriptPlan -> SemanticAnalyzer.getTableDescFromSerDe -> PlanUtis.getTableDesc What is it all about and how should we untangle it (ideally get rid of SERIALIZATION_USE_JSON_OBJECTS)? Thanks. Steven -Original Message- From: Zheng Shao [mailto:zs...@facebook.com] Sent: Wednesday, September 01, 2010 6:45 PM To: Steven Wong; hive-dev@hadoop.apache.org; John Sichi Cc: Jerome Boulon Subject: RE: Deserializing map column via JDBC (HIVE-1378) Hi Steven, As far as I remember, the only use case of JSON logic in LazySimpleSerDe is the FetchTask. Even if there are other cases, we should be able to catch it in unit tests. The potential risk is small enough, and the benefit of cleaning it up is pretty big - it makes the code much easier to understand. Thanks for getting to it Steven! I am very happy to see that this finally gets cleaned up! Zheng -Original Message- From: Steven Wong [mailto:sw...@netflix.com] Sent: Thursday, September 02, 2010 7:45 AM To: Zheng Shao; hive-dev@hadoop.apache.org; John Sichi Cc: Jerome Boulon Subject: RE: Deserializing map column via JDBC (HIVE-1378) Your suggestion is in line with my earlier proposal of fixing FetchTask. The only major difference is the moving of the JSON-related logic from LazySimpleSerDe to a new serde called DelimitedJSONSerDe. Is it safe to get rid of the JSON-related logic in LazySimpleSerDe? Sounds like you're implying that it is safe, but I'd like to confirm with you. I don't really know whether there are components other than FetchTask that rely on LazySimpleSerDe and its JSON capability (the useJSONSerialize flag doesn't have to be true for LazySimpleSerDe to use JSON). If it is safe, I am totally fine with introducing DelimitedJSONSerDe. Combining your suggestion and my proposal would look like: 0. Move JSON serialization logic from LazySimpleSerDe to a new serde called DelimitedJSONSerDe. 1. By default, hive.fetch.output.serde = DelimitedJSONSerDe. 2. When JDBC driver connects to Hive server, execute "set hive.fetch.output.serde = LazySimpleSerDe". 3. In Hive server: (a) If hive.fetch.output.serde == DelimitedJSONSerDe, FetchTask uses DelimitedJSONSerDe to maintain today's serialization behavior (tab for field delimiter tab, "NULL" for null sequence, JSON for non-primitives). (b) If hive.fetch.output.serde == LazySimpleSerDe, FetchTask uses LazySimpleSerDe with a schema to ctrl-delimit everything. 4. JDBC driver deserializes with LazySimpleSerDe instead of DynamicSerDe. Steven -Original Message- From: Zheng Shao [mailto:zs...@facebook.com] Sent: Wednesday, September 01, 2010 3:22 AM To: Steven Wong; hive-dev@hadoop.apache.org; John Sichi Cc: Jerome Boulon Subject: RE: Deserializing map column via JDBC (HIVE-1378) Hi Steven, Sorry for the late reply. The email slipped my eye... This issue was brought up multiple times. In my opinion, using JSON in LazySimpleSerDe (inherited from ColumnsetSerDe, MetadataColumnsetSerDe, DynamicSerDe) was a long-time legacy problem that never got fixed. LazySimpleSerDe was supposed to do delimited format only. The cleanest way to do that is to: 1. Get rid of the JSON-related logic in LazySimpleSerDe; 2. Introduce another "DelimitedJSONSerDe" (without deserialization capability) that does JSON serialization for complex fields. (We never have or need deserialization for JSON yet) 3. Configure the FetchTask to use the new SerDe by default, and LazySimpleSerDe in case it's JDBC. This
RE: Deserializing map column via JDBC (HIVE-1378)
Zheng, In LazySimpleSerDe.initSerdeParams: String useJsonSerialize = tbl .getProperty(Constants.SERIALIZATION_USE_JSON_OBJECTS); serdeParams.jsonSerialize = (useJsonSerialize != null && useJsonSerialize .equalsIgnoreCase("true")); SERIALIZATION_USE_JSON_OBJECTS is set to true in PlanUtis.getTableDesc: // It is not a very clean way, and should be modified later - due to // compatiblity reasons, // user sees the results as json for custom scripts and has no way for // specifying that. // Right now, it is hard-coded in the code if (useJSONForLazy) { properties.setProperty(Constants.SERIALIZATION_USE_JSON_OBJECTS, "true"); } useJSONForLazy is true in the following 2 calls to PlanUtis.getTableDesc: SemanticAnalyzer.genScriptPlan -> PlanUtis.getTableDesc SemanticAnalyzer.genScriptPlan -> SemanticAnalyzer.getTableDescFromSerDe -> PlanUtis.getTableDesc What is it all about and how should we untangle it (ideally get rid of SERIALIZATION_USE_JSON_OBJECTS)? Thanks. Steven -Original Message- From: Zheng Shao [mailto:zs...@facebook.com] Sent: Wednesday, September 01, 2010 6:45 PM To: Steven Wong; hive-dev@hadoop.apache.org; John Sichi Cc: Jerome Boulon Subject: RE: Deserializing map column via JDBC (HIVE-1378) Hi Steven, As far as I remember, the only use case of JSON logic in LazySimpleSerDe is the FetchTask. Even if there are other cases, we should be able to catch it in unit tests. The potential risk is small enough, and the benefit of cleaning it up is pretty big - it makes the code much easier to understand. Thanks for getting to it Steven! I am very happy to see that this finally gets cleaned up! Zheng -Original Message- From: Steven Wong [mailto:sw...@netflix.com] Sent: Thursday, September 02, 2010 7:45 AM To: Zheng Shao; hive-dev@hadoop.apache.org; John Sichi Cc: Jerome Boulon Subject: RE: Deserializing map column via JDBC (HIVE-1378) Your suggestion is in line with my earlier proposal of fixing FetchTask. The only major difference is the moving of the JSON-related logic from LazySimpleSerDe to a new serde called DelimitedJSONSerDe. Is it safe to get rid of the JSON-related logic in LazySimpleSerDe? Sounds like you're implying that it is safe, but I'd like to confirm with you. I don't really know whether there are components other than FetchTask that rely on LazySimpleSerDe and its JSON capability (the useJSONSerialize flag doesn't have to be true for LazySimpleSerDe to use JSON). If it is safe, I am totally fine with introducing DelimitedJSONSerDe. Combining your suggestion and my proposal would look like: 0. Move JSON serialization logic from LazySimpleSerDe to a new serde called DelimitedJSONSerDe. 1. By default, hive.fetch.output.serde = DelimitedJSONSerDe. 2. When JDBC driver connects to Hive server, execute "set hive.fetch.output.serde = LazySimpleSerDe". 3. In Hive server: (a) If hive.fetch.output.serde == DelimitedJSONSerDe, FetchTask uses DelimitedJSONSerDe to maintain today's serialization behavior (tab for field delimiter tab, "NULL" for null sequence, JSON for non-primitives). (b) If hive.fetch.output.serde == LazySimpleSerDe, FetchTask uses LazySimpleSerDe with a schema to ctrl-delimit everything. 4. JDBC driver deserializes with LazySimpleSerDe instead of DynamicSerDe. Steven -Original Message- From: Zheng Shao [mailto:zs...@facebook.com] Sent: Wednesday, September 01, 2010 3:22 AM To: Steven Wong; hive-dev@hadoop.apache.org; John Sichi Cc: Jerome Boulon Subject: RE: Deserializing map column via JDBC (HIVE-1378) Hi Steven, Sorry for the late reply. The email slipped my eye... This issue was brought up multiple times. In my opinion, using JSON in LazySimpleSerDe (inherited from ColumnsetSerDe, MetadataColumnsetSerDe, DynamicSerDe) was a long-time legacy problem that never got fixed. LazySimpleSerDe was supposed to do delimited format only. The cleanest way to do that is to: 1. Get rid of the JSON-related logic in LazySimpleSerDe; 2. Introduce another "DelimitedJSONSerDe" (without deserialization capability) that does JSON serialization for complex fields. (We never have or need deserialization for JSON yet) 3. Configure the FetchTask to use the new SerDe by default, and LazySimpleSerDe in case it's JDBC. This is for serialization only. We might need to have 2 SerDe fields in FetchTask - one for deserialization the data from file, one for serialization the data to stdout/jdbc etc. I can help review the code (please ping me) if you decide to go down this route. Zheng -Original Message- From: Steven Wong [mailto:sw...@netflix.com] Sent: Monday, August 30, 2010 3:46 PM To: hive-dev@hadoop.apache.org; John Sichi Cc: Zheng Shao; Jerome Boulon Subject: RE: Deserializing map column via JDBC (HIVE-1378) Any guidance on how I/we should proceed on HIVE-1378 and HIVE-1606? -Original Message- From: Steven Wong Sent: Friday, August 27, 201
[jira] Commented: (HIVE-1609) Support partition filtering in metastore
[ https://issues.apache.org/jira/browse/HIVE-1609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12905769#action_12905769 ] Carl Steinbach commented on HIVE-1609: -- DynamicSerDe is the component that has a JavaCC dependency. I think DynamicSerDe (and TCTLSeparatedProtocol) were deprecated a long time ago. Should we try to remove this code? > Support partition filtering in metastore > > > Key: HIVE-1609 > URL: https://issues.apache.org/jira/browse/HIVE-1609 > Project: Hadoop Hive > Issue Type: New Feature > Components: Metastore >Reporter: Ajay Kidave > Fix For: 0.7.0 > > Attachments: hive_1609.patch, hive_1609_2.patch > > > The metastore needs to have support for returning a list of partitions based > on user specified filter conditions. This will be useful for tools which need > to do partition pruning. Howl is one such use case. The way partition pruning > is done during hive query execution need not be changed. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1609) Support partition filtering in metastore
[ https://issues.apache.org/jira/browse/HIVE-1609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12905768#action_12905768 ] Namit Jain commented on HIVE-1609: -- I think we should stick to antlr only - let us not check in javacc > Support partition filtering in metastore > > > Key: HIVE-1609 > URL: https://issues.apache.org/jira/browse/HIVE-1609 > Project: Hadoop Hive > Issue Type: New Feature > Components: Metastore >Reporter: Ajay Kidave > Fix For: 0.7.0 > > Attachments: hive_1609.patch, hive_1609_2.patch > > > The metastore needs to have support for returning a list of partitions based > on user specified filter conditions. This will be useful for tools which need > to do partition pruning. Howl is one such use case. The way partition pruning > is done during hive query execution need not be changed. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1546) Ability to plug custom Semantic Analyzers for Hive Grammar
[ https://issues.apache.org/jira/browse/HIVE-1546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12905767#action_12905767 ] Namit Jain commented on HIVE-1546: -- Sorry on jumping on this late. I quickly reviewed http://wiki.apache.org/pig/Howl/HowlCliFuncSpec, and it seems like most of the functionality is already present in hive. So, we need a way to restrict other types of statements - is that a fair statement ? If there is a slight change needed in hive (for some howl behavior), we can add it to hive ? Why do we need a brand new client ? > Ability to plug custom Semantic Analyzers for Hive Grammar > -- > > Key: HIVE-1546 > URL: https://issues.apache.org/jira/browse/HIVE-1546 > Project: Hadoop Hive > Issue Type: Improvement > Components: Metastore >Affects Versions: 0.7.0 >Reporter: Ashutosh Chauhan >Assignee: Ashutosh Chauhan > Fix For: 0.7.0 > > Attachments: hive-1546-3.patch, hive-1546-4.patch, hive-1546.patch, > hive-1546_2.patch > > > It will be useful if Semantic Analysis phase is made pluggable such that > other projects can do custom analysis of hive queries before doing metastore > operations on them. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1580) cleanup ExecDriver.progress
[ https://issues.apache.org/jira/browse/HIVE-1580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12905766#action_12905766 ] Namit Jain commented on HIVE-1580: -- +1 > cleanup ExecDriver.progress > --- > > Key: HIVE-1580 > URL: https://issues.apache.org/jira/browse/HIVE-1580 > Project: Hadoop Hive > Issue Type: Bug > Components: Query Processor >Reporter: Joydeep Sen Sarma >Assignee: Joydeep Sen Sarma > Attachments: hive-1580.1.patch > > > a few problems: > - if a job is retired - then counters cannot be obtained and a stack trace is > printed out (from history code). this confuses users > - too many calls to getCounters. after a job has been detected to be finished > - there are quite a few more calls to get the job status and the counters. we > need to figure out a way to curtail this - in busy clusters the gap between > the job getting finished and the hive client noticing is very perceptible and > impacts user experience. > calls to getCounters are very expensive in 0.20 as they grab a jobtracker > global lock (something we have fixed internally at FB) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1580) cleanup ExecDriver.progress
[ https://issues.apache.org/jira/browse/HIVE-1580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joydeep Sen Sarma updated HIVE-1580: Attachment: hive-1580.1.patch cleanup multiple calls to getCounters (which turns out to be really expensive call in JT) and don't print non-fatal stack traces to console. > cleanup ExecDriver.progress > --- > > Key: HIVE-1580 > URL: https://issues.apache.org/jira/browse/HIVE-1580 > Project: Hadoop Hive > Issue Type: Bug > Components: Query Processor >Reporter: Joydeep Sen Sarma >Assignee: Joydeep Sen Sarma > Attachments: hive-1580.1.patch > > > a few problems: > - if a job is retired - then counters cannot be obtained and a stack trace is > printed out (from history code). this confuses users > - too many calls to getCounters. after a job has been detected to be finished > - there are quite a few more calls to get the job status and the counters. we > need to figure out a way to curtail this - in busy clusters the gap between > the job getting finished and the hive client noticing is very perceptible and > impacts user experience. > calls to getCounters are very expensive in 0.20 as they grab a jobtracker > global lock (something we have fixed internally at FB) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1546) Ability to plug custom Semantic Analyzers for Hive Grammar
[ https://issues.apache.org/jira/browse/HIVE-1546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12905757#action_12905757 ] Carl Steinbach commented on HIVE-1546: -- I gather from Ashutosh's latest patch that you want to do the following: * Provide your own implementation of HiveSemanticAnalyzerFactory. * Subclass SemanticAnalyzer * Subclass DDLSemanticAnalzyer I looked at the public and protected members in these classes and think that at a minimum we would have to mark the following classes as limited private and evolving: * HiveSemanticAnalyzerFactory * BaseSemanticAnalyzer * SemanticAnalyzer * DDLSemanticAnalyzer * ASTNode * HiveParser (i.e. Hive's ANTLR grammar) * SemanticAnalyzer Context (org.apache.hadoop.hive.ql.Context) * Task and FetchTask * QB * QBParseInfo * QBMetaData * QBJoinTree * CreateTableDesc So anytime we touch one of these classes we would need to coordinate with the Howl folks to make sure we aren't breaking one of their plugins? I don't think this is a good tradeoff if the main benefit we can expect is a simpler build and release process for Howl. > Ability to plug custom Semantic Analyzers for Hive Grammar > -- > > Key: HIVE-1546 > URL: https://issues.apache.org/jira/browse/HIVE-1546 > Project: Hadoop Hive > Issue Type: Improvement > Components: Metastore >Affects Versions: 0.7.0 >Reporter: Ashutosh Chauhan >Assignee: Ashutosh Chauhan > Fix For: 0.7.0 > > Attachments: hive-1546-3.patch, hive-1546-4.patch, hive-1546.patch, > hive-1546_2.patch > > > It will be useful if Semantic Analysis phase is made pluggable such that > other projects can do custom analysis of hive queries before doing metastore > operations on them. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1546) Ability to plug custom Semantic Analyzers for Hive Grammar
[ https://issues.apache.org/jira/browse/HIVE-1546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12905752#action_12905752 ] Namit Jain commented on HIVE-1546: -- Would it be possible to do it via a hook ? Do you want to allow a subset of operations ? The hook is not very advanced right now, and you cannot change the query plan etc. But, it might be good enough for disallowing a class of statements. We can add more parameters to the hook if need be. That way, the change will be completely outside hive, and we will be able to use the existing client, but with a limited functionality. > Ability to plug custom Semantic Analyzers for Hive Grammar > -- > > Key: HIVE-1546 > URL: https://issues.apache.org/jira/browse/HIVE-1546 > Project: Hadoop Hive > Issue Type: Improvement > Components: Metastore >Affects Versions: 0.7.0 >Reporter: Ashutosh Chauhan >Assignee: Ashutosh Chauhan > Fix For: 0.7.0 > > Attachments: hive-1546-3.patch, hive-1546-4.patch, hive-1546.patch, > hive-1546_2.patch > > > It will be useful if Semantic Analysis phase is made pluggable such that > other projects can do custom analysis of hive queries before doing metastore > operations on them. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1610) Using CombinedHiveInputFormat causes partToPartitionInfo IOException
[ https://issues.apache.org/jira/browse/HIVE-1610?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12905751#action_12905751 ] He Yongqiang commented on HIVE-1610: Sammy, the only change in TestHiveFileFormatUtils is to remove URI scheme checks (1 line change). You actually added some lines of code which were removed by HIVE-1510, and this is the reason the testcase fails. > Using CombinedHiveInputFormat causes partToPartitionInfo IOException > -- > > Key: HIVE-1610 > URL: https://issues.apache.org/jira/browse/HIVE-1610 > Project: Hadoop Hive > Issue Type: Bug > Environment: Hadoop 0.20.2 >Reporter: Sammy Yu > Attachments: > 0002-HIVE-1610.-Added-additional-schema-check-to-doGetPar.patch, > 0003-HIVE-1610.patch > > > I have a relatively complicated hive query using CombinedHiveInputFormat: > set hive.exec.dynamic.partition.mode=nonstrict; > set hive.exec.dynamic.partition=true; > set hive.exec.max.dynamic.partitions=1000; > set hive.exec.max.dynamic.partitions.pernode=300; > set hive.input.format=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat; > INSERT OVERWRITE TABLE keyword_serp_results_no_dups PARTITION(week) select > distinct keywords.keyword, keywords.domain, keywords.url, keywords.rank, > keywords.universal_rank, keywords.serp_type, keywords.date_indexed, > keywords.search_engine_type, keywords.week from keyword_serp_results keywords > JOIN (select domain, keyword, search_engine_type, week, max_date_indexed, > min(rank) as best_rank from (select keywords1.domain, keywords1.keyword, > keywords1.search_engine_type, keywords1.week, keywords1.rank, > dupkeywords1.max_date_indexed from keyword_serp_results keywords1 JOIN > (select domain, keyword, search_engine_type, week, max(date_indexed) as > max_date_indexed from keyword_serp_results group by > domain,keyword,search_engine_type,week) dupkeywords1 on keywords1.keyword = > dupkeywords1.keyword AND keywords1.domain = dupkeywords1.domain AND > keywords1.search_engine_type = dupkeywords1.search_engine_type AND > keywords1.week = dupkeywords1.week AND keywords1.date_indexed = > dupkeywords1.max_date_indexed) dupkeywords2 group by > domain,keyword,search_engine_type,week,max_date_indexed ) dupkeywords3 on > keywords.keyword = dupkeywords3.keyword AND keywords.domain = > dupkeywords3.domain AND keywords.search_engine_type = > dupkeywords3.search_engine_type AND keywords.week = dupkeywords3.week AND > keywords.date_indexed = dupkeywords3.max_date_indexed AND keywords.rank = > dupkeywords3.best_rank; > > This query use to work fine until I updated to r991183 on trunk and started > getting this error: > java.io.IOException: cannot find dir = > hdfs://ec2-75-101-174-245.compute-1.amazonaws.com/tmp/hive-root/hive_2010-09-01_10-57-41_396_1409145025949924904/-mr-10002/00_0 > in > partToPartitionInfo: > [hdfs://ec2-75-101-174-245.compute-1.amazonaws.com:8020/tmp/hive-root/hive_2010-09-01_10-57-41_396_1409145025949924904/-mr-10002, > hdfs://ec2-75-101-174-245.compute-1.amazonaws.com/user/root/domain_keywords/account=417/week=201035/day=20100829, > hdfs://ec2-75-101-174-245.compute-1.amazonaws.com/user/root/domain_keywords/account=418/week=201035/day=20100829, > hdfs://ec2-75-101-174-245.compute-1.amazonaws.com/user/root/domain_keywords/account=419/week=201035/day=20100829, > hdfs://ec2-75-101-174-245.compute-1.amazonaws.com/user/root/domain_keywords/account=422/week=201035/day=20100829, > hdfs://ec2-75-101-174-245.compute-1.amazonaws.com/user/root/domain_keywords/account=422/week=201035/day=20100831] > at > org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getPartitionDescFromPathRecursively(HiveFileFormatUtils.java:277) > at > org.apache.hadoop.hive.ql.io.CombineHiveInputFormat$CombineHiveInputSplit.(CombineHiveInputFormat.java:100) > at > org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getSplits(CombineHiveInputFormat.java:312) > at org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:810) > at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:781) > at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:730) > at org.apache.hadoop.hive.ql.exec.ExecDriver.execute(ExecDriver.java:610) > at org.apache.hadoop.hive.ql.exec.MapRedTask.execute(MapRedTask.java:120) > at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:108) > This query works if I don't change the hive.input.format. > set hive.input.format=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat; > I've narrowed down this issue to the commit for HIVE-1510. If I take out the > changeset from r987746, everything works as before. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1609) Support partition filtering in metastore
[ https://issues.apache.org/jira/browse/HIVE-1609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12905743#action_12905743 ] Ajay Kidave commented on HIVE-1609: --- The parser was written in javacc since it is derived from similar functionality in Owl. It was decided to reuse the existing parser when the filter representation was discussed. If generated code is the issue, I can change the build to pull javacc through ivy and not have the generated code checked in (it is so currently because that was how it is in serde). Another possibility is we can open another JIRA to change the parser implementation to ANTLR. Do let me know what would work. > Support partition filtering in metastore > > > Key: HIVE-1609 > URL: https://issues.apache.org/jira/browse/HIVE-1609 > Project: Hadoop Hive > Issue Type: New Feature > Components: Metastore >Reporter: Ajay Kidave > Fix For: 0.7.0 > > Attachments: hive_1609.patch, hive_1609_2.patch > > > The metastore needs to have support for returning a list of partitions based > on user specified filter conditions. This will be useful for tools which need > to do partition pruning. Howl is one such use case. The way partition pruning > is done during hive query execution need not be changed. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (HIVE-849) .. not supported
[ https://issues.apache.org/jira/browse/HIVE-849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Namit Jain resolved HIVE-849. - Hadoop Flags: [Reviewed] Resolution: Duplicate OK - sounds good > .. not supported > > > Key: HIVE-849 > URL: https://issues.apache.org/jira/browse/HIVE-849 > Project: Hadoop Hive > Issue Type: New Feature >Reporter: Namit Jain >Assignee: Carl Steinbach > -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-849) .. not supported
[ https://issues.apache.org/jira/browse/HIVE-849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12905719#action_12905719 ] Carl Steinbach commented on HIVE-849: - @Namit: Correct, but this issue is also covered by HIVE-1517, and the comments in that ticket provide more details, so I decided to resolve this ticket as a duplicate of HIVE-1517. > .. not supported > > > Key: HIVE-849 > URL: https://issues.apache.org/jira/browse/HIVE-849 > Project: Hadoop Hive > Issue Type: New Feature >Reporter: Namit Jain >Assignee: Carl Steinbach > -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1546) Ability to plug custom Semantic Analyzers for Hive Grammar
[ https://issues.apache.org/jira/browse/HIVE-1546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12905711#action_12905711 ] Alan Gates commented on HIVE-1546: -- Using the definitions given in HADOOP-5073, can we call this interface limited private and evolving? We (the Howl team) know it will continue to change, and we understand Hive's desire not to make this a public API. But checking Howl code into Hive just muddles things and makes our build and release process harder. > Ability to plug custom Semantic Analyzers for Hive Grammar > -- > > Key: HIVE-1546 > URL: https://issues.apache.org/jira/browse/HIVE-1546 > Project: Hadoop Hive > Issue Type: Improvement > Components: Metastore >Affects Versions: 0.7.0 >Reporter: Ashutosh Chauhan >Assignee: Ashutosh Chauhan > Fix For: 0.7.0 > > Attachments: hive-1546-3.patch, hive-1546-4.patch, hive-1546.patch, > hive-1546_2.patch > > > It will be useful if Semantic Analysis phase is made pluggable such that > other projects can do custom analysis of hive queries before doing metastore > operations on them. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Reopened: (HIVE-849) .. not supported
[ https://issues.apache.org/jira/browse/HIVE-849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Namit Jain reopened HIVE-849: - Assignee: Carl Steinbach (was: He Yongqiang) @Carl, I think this referred to the ability of selecting a table from database1 while using database2 > .. not supported > > > Key: HIVE-849 > URL: https://issues.apache.org/jira/browse/HIVE-849 > Project: Hadoop Hive > Issue Type: New Feature >Reporter: Namit Jain >Assignee: Carl Steinbach > -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1546) Ability to plug custom Semantic Analyzers for Hive Grammar
[ https://issues.apache.org/jira/browse/HIVE-1546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12905707#action_12905707 ] Carl Steinbach commented on HIVE-1546: -- I'm +1 on the approach outlined by John. > Ability to plug custom Semantic Analyzers for Hive Grammar > -- > > Key: HIVE-1546 > URL: https://issues.apache.org/jira/browse/HIVE-1546 > Project: Hadoop Hive > Issue Type: Improvement > Components: Metastore >Affects Versions: 0.7.0 >Reporter: Ashutosh Chauhan >Assignee: Ashutosh Chauhan > Fix For: 0.7.0 > > Attachments: hive-1546-3.patch, hive-1546-4.patch, hive-1546.patch, > hive-1546_2.patch > > > It will be useful if Semantic Analysis phase is made pluggable such that > other projects can do custom analysis of hive queries before doing metastore > operations on them. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1546) Ability to plug custom Semantic Analyzers for Hive Grammar
[ https://issues.apache.org/jira/browse/HIVE-1546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12905706#action_12905706 ] John Sichi commented on HIVE-1546: -- That's fine with me if it doesn't drag in unrelated dependencies. I would vote for contrib, with the plugin mechanism remaining the same as Ashutosh has defined it, but with the config parameter explicitly defining it as intended for internal use only for now. Ashutosh, could you run this proposal by the Howl team and see if that is acceptable? > Ability to plug custom Semantic Analyzers for Hive Grammar > -- > > Key: HIVE-1546 > URL: https://issues.apache.org/jira/browse/HIVE-1546 > Project: Hadoop Hive > Issue Type: Improvement > Components: Metastore >Affects Versions: 0.7.0 >Reporter: Ashutosh Chauhan >Assignee: Ashutosh Chauhan > Fix For: 0.7.0 > > Attachments: hive-1546-3.patch, hive-1546-4.patch, hive-1546.patch, > hive-1546_2.patch > > > It will be useful if Semantic Analysis phase is made pluggable such that > other projects can do custom analysis of hive queries before doing metastore > operations on them. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1467) dynamic partitioning should cluster by partitions
[ https://issues.apache.org/jira/browse/HIVE-1467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12905700#action_12905700 ] Ning Zhang commented on HIVE-1467: -- As discussed with Joydeep and Ashish, it seems we should use the "distribute by" mechanism rather than "cluster by" to avoid sorting at the reducer side. The difference between them is "distribute by" only have MapReduce partition columns set to be the Dyanmic partition columns, and "cluster by" will additionally set "key columns" as the dynamic partition columns as well. So I think we can use 2 mode of reducer-side DP with tradeoffs: -- distribute by mode: no sorting but reducers have to keep all files open during DP insert. Good choice when there are large amount of data passed from mappers to reducers. -- cluster by mode: sorting by the DP columns, but we can close a DP file once FileSinkOperator sees a dfferent DP column value. Good choice when total data size is not that large but there are large number of DPs generated. > dynamic partitioning should cluster by partitions > - > > Key: HIVE-1467 > URL: https://issues.apache.org/jira/browse/HIVE-1467 > Project: Hadoop Hive > Issue Type: Improvement >Reporter: Joydeep Sen Sarma >Assignee: Namit Jain > > (based on internal discussion with Ning). Dynamic partitioning should offer a > mode where it clusters data by partition before writing out to each > partition. This will reduce number of files. Details: > 1. always use reducer stage > 2. mapper sends to reducer based on partitioning column. ie. reducer = > f(partition-cols) > 3. f() can be made somewhat smart to: >a. spread large partitions across multiple reducers - each mapper can > maintain row count seen per partition - and then apply (whenever it sees a > new row for a partition): >* reducer = (row count / 64k) % numReducers >Small partitions always go to one reducer. the larger the partition, > the more the reducers. this prevents one reducer becoming bottleneck writing > out one partition >b. this still leaves the issue of very large number of splits. (64K rows > from 10K mappers is pretty large). for this one can apply one slight > modification: >* reducer = (mapper-id/1024 + row-count/64k) % numReducers >ie. - the first 1000 mappers always send the first 64K rows for one > partition to the same reducer. the next 1000 send it to the next one. and so > on. > the constants 1024 and 64k are used just as an example. i don't know what the > right numbers are. it's also clear that this is a case where we need hadoop > to do only partitioning (and no sorting). this will be a useful feature to > have in hadoop. that will reduce the overhead due to reducers. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1546) Ability to plug custom Semantic Analyzers for Hive Grammar
[ https://issues.apache.org/jira/browse/HIVE-1546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12905692#action_12905692 ] Carl Steinbach commented on HIVE-1546: -- What do you think of this option: we check the Howl SemanticAnalyzer into the Hive source tree and provide a config option that optionally enables it? This gives Howl the features they need without making the SemanticAnalyzer API public. > Ability to plug custom Semantic Analyzers for Hive Grammar > -- > > Key: HIVE-1546 > URL: https://issues.apache.org/jira/browse/HIVE-1546 > Project: Hadoop Hive > Issue Type: Improvement > Components: Metastore >Affects Versions: 0.7.0 >Reporter: Ashutosh Chauhan >Assignee: Ashutosh Chauhan > Fix For: 0.7.0 > > Attachments: hive-1546-3.patch, hive-1546-4.patch, hive-1546.patch, > hive-1546_2.patch > > > It will be useful if Semantic Analysis phase is made pluggable such that > other projects can do custom analysis of hive queries before doing metastore > operations on them. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1130) Create argmin and argmax
[ https://issues.apache.org/jira/browse/HIVE-1130?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] John Sichi updated HIVE-1130: - Status: Open (was: Patch Available) > Create argmin and argmax > > > Key: HIVE-1130 > URL: https://issues.apache.org/jira/browse/HIVE-1130 > Project: Hadoop Hive > Issue Type: Improvement >Affects Versions: 0.7.0 >Reporter: Zheng Shao >Assignee: Pierre Huyn > Fix For: 0.7.0 > > Attachments: HIVE-1130.1.patch, HIVE-1130.2.patch > > > With HIVE-1128, users can already do what argmax and argmin does. > But it will be helpful if we provide these functions explicitly so people > from maths/stats background can use it more easily. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1546) Ability to plug custom Semantic Analyzers for Hive Grammar
[ https://issues.apache.org/jira/browse/HIVE-1546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12905687#action_12905687 ] John Sichi commented on HIVE-1546: -- It's the usual tradeoffs on copy-and-paste vs factoring. There's a significant amount of DDL processing code which can be shared, and that will continue to grow as we add new features (e.g. GRANT/REVOKE) which are applicable to both. > Ability to plug custom Semantic Analyzers for Hive Grammar > -- > > Key: HIVE-1546 > URL: https://issues.apache.org/jira/browse/HIVE-1546 > Project: Hadoop Hive > Issue Type: Improvement > Components: Metastore >Affects Versions: 0.7.0 >Reporter: Ashutosh Chauhan >Assignee: Ashutosh Chauhan > Fix For: 0.7.0 > > Attachments: hive-1546-3.patch, hive-1546-4.patch, hive-1546.patch, > hive-1546_2.patch > > > It will be useful if Semantic Analysis phase is made pluggable such that > other projects can do custom analysis of hive queries before doing metastore > operations on them. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1546) Ability to plug custom Semantic Analyzers for Hive Grammar
[ https://issues.apache.org/jira/browse/HIVE-1546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12905686#action_12905686 ] Carl Steinbach commented on HIVE-1546: -- bq. we've agreed at the high level on the approach of creating Howl as a wrapper around Hive I thought Howl was supposed to be a wrapper around (and replacement for) the Hive metastore, not all of Hive. I think there are clear advantages to Hive and Howl sharing the same metastore code as long as they access this facility through the public API, but can't say the same for the two projects using the same CLI code if it means allowing external projects to depend on loosely defined set of internal APIs. What benefits are we hoping to achieve by having Howl and Hive share the same CLI code, especially if Howl is only interested in a small part of it? What are the drawbacks of instead encouraging the Howl project to copy the CLI code and maintain their own version? > Ability to plug custom Semantic Analyzers for Hive Grammar > -- > > Key: HIVE-1546 > URL: https://issues.apache.org/jira/browse/HIVE-1546 > Project: Hadoop Hive > Issue Type: Improvement > Components: Metastore >Affects Versions: 0.7.0 >Reporter: Ashutosh Chauhan >Assignee: Ashutosh Chauhan > Fix For: 0.7.0 > > Attachments: hive-1546-3.patch, hive-1546-4.patch, hive-1546.patch, > hive-1546_2.patch > > > It will be useful if Semantic Analysis phase is made pluggable such that > other projects can do custom analysis of hive queries before doing metastore > operations on them. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1609) Support partition filtering in metastore
[ https://issues.apache.org/jira/browse/HIVE-1609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12905679#action_12905679 ] John Sichi commented on HIVE-1609: -- I agree with Carl regarding the parser: let's move it to ANTLR. We have too much generated code checked into Hive already, and we're trying to move away from that. > Support partition filtering in metastore > > > Key: HIVE-1609 > URL: https://issues.apache.org/jira/browse/HIVE-1609 > Project: Hadoop Hive > Issue Type: New Feature > Components: Metastore >Reporter: Ajay Kidave > Fix For: 0.7.0 > > Attachments: hive_1609.patch, hive_1609_2.patch > > > The metastore needs to have support for returning a list of partitions based > on user specified filter conditions. This will be useful for tools which need > to do partition pruning. Howl is one such use case. The way partition pruning > is done during hive query execution need not be changed. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: Build Crashing on Hive 0.5 Release
On Thu, Sep 2, 2010 at 5:12 PM, Stephen Watt wrote: > Hi Folks > > I'm a Hadoop contributor and am presently working to get both Hadoop and > Hive running on alternate JREs such as Apache Harmony and IBM Java. > > I noticed when building and running the functional tests ("clean test > tar") for the Hive 0.5 release (i.e. not nightly build) , the build > crashes right after running > org.apache.hadoop.hive.ql.tool.TestLineageInfo. In addition, the > TestCLIDriver Test Case fails as well. This is all using SUN JDK 1.60_14. > I'm running on a SLES 10 system. > > This is a little odd, given that this is a release and not a nightly > build. Although, its not uncommon for me to see Hudson pass tests that > fail when running locally. Can someone confirm the build works for them? > > This is my build script: > > #!/bin/sh > > # Set Build Dependencies > set PATH=$PATH:/home/hive/Java-Versions/jdk1.6.0_14/bin/ > export ANT_HOME=/home/hive/Test-Dependencies/apache-ant-1.7.1 > export JAVA_HOME=/home/hive/Java-Versions/jdk1.6.0_14 > export BUILD_DIR=/home/hive/hive-0.5.0-build > export HIVE_BUILD=$BUILD_DIR/build > export HIVE_INSTALL=$BUILD_DIR/hive-0.5.0-dev/ > export HIVE_SRC=$HIVE_INSTALL/src > export PATH=$PATH:$ANT_HOME/bin > > # Define Hadoop Version to Use > HADOOP_VER=0.20.2 > > # Run Build and Unit Test > cd $HIVE_SRC > ant -Dtarget.dir=$HIVE_BUILD -Dhadoop.version=$HADOOP_VER clean test tar > > $BUILD_DIR/hiveSUN32Build.out > > > Regards > Steve Watt I seem to remember. There were some older bugs when specifying the minor versions of the 20 branch. can you try: HADOOP_VER=0.20.0 Rather then: HADOOP_VER=0.20.2
[jira] Commented: (HIVE-1546) Ability to plug custom Semantic Analyzers for Hive Grammar
[ https://issues.apache.org/jira/browse/HIVE-1546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12905675#action_12905675 ] John Sichi commented on HIVE-1546: -- New dependencies: we don't prevent anyone from using it, but we can Javadoc it as unstable. We can work out the language now in an updated patch since there's currently no Javadoc on the factory interface. Dependencies on AST/ANTLR: it does make such changes more expensive in terms of impact analysis and migration, but it doesn't really prevent us in any way, does it? Given that we've agreed at the high level on the approach of creating Howl as a wrapper around Hive (reusing as much as possible of what's already there), can you suggest an alternative mechanism that addresses the requirements while minimizing the injection of Howl behavior directly into Hive itself? If it were something generic like a bitmask of allowed operations, I could kind of see it, but the validation logic is more involved than that (and may become even more so over time). I wasn't able to come up with anything clean on that front myself, which is why I suggested the factoring approach to Pradeep originally. Apologies for not getting stuff aired out sooner. > Ability to plug custom Semantic Analyzers for Hive Grammar > -- > > Key: HIVE-1546 > URL: https://issues.apache.org/jira/browse/HIVE-1546 > Project: Hadoop Hive > Issue Type: Improvement > Components: Metastore >Affects Versions: 0.7.0 >Reporter: Ashutosh Chauhan >Assignee: Ashutosh Chauhan > Fix For: 0.7.0 > > Attachments: hive-1546-3.patch, hive-1546-4.patch, hive-1546.patch, > hive-1546_2.patch > > > It will be useful if Semantic Analysis phase is made pluggable such that > other projects can do custom analysis of hive queries before doing metastore > operations on them. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Build Crashing on Hive 0.5 Release
Hi Folks I'm a Hadoop contributor and am presently working to get both Hadoop and Hive running on alternate JREs such as Apache Harmony and IBM Java. I noticed when building and running the functional tests ("clean test tar") for the Hive 0.5 release (i.e. not nightly build) , the build crashes right after running org.apache.hadoop.hive.ql.tool.TestLineageInfo. In addition, the TestCLIDriver Test Case fails as well. This is all using SUN JDK 1.60_14. I'm running on a SLES 10 system. This is a little odd, given that this is a release and not a nightly build. Although, its not uncommon for me to see Hudson pass tests that fail when running locally. Can someone confirm the build works for them? This is my build script: #!/bin/sh # Set Build Dependencies set PATH=$PATH:/home/hive/Java-Versions/jdk1.6.0_14/bin/ export ANT_HOME=/home/hive/Test-Dependencies/apache-ant-1.7.1 export JAVA_HOME=/home/hive/Java-Versions/jdk1.6.0_14 export BUILD_DIR=/home/hive/hive-0.5.0-build export HIVE_BUILD=$BUILD_DIR/build export HIVE_INSTALL=$BUILD_DIR/hive-0.5.0-dev/ export HIVE_SRC=$HIVE_INSTALL/src export PATH=$PATH:$ANT_HOME/bin # Define Hadoop Version to Use HADOOP_VER=0.20.2 # Run Build and Unit Test cd $HIVE_SRC ant -Dtarget.dir=$HIVE_BUILD -Dhadoop.version=$HADOOP_VER clean test tar > $BUILD_DIR/hiveSUN32Build.out Regards Steve Watt
[jira] Updated: (HIVE-1609) Support partition filtering in metastore
[ https://issues.apache.org/jira/browse/HIVE-1609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ajay Kidave updated HIVE-1609: -- Status: Patch Available (was: Open) Release Note: Added support for a new listPartitionsByFilter API in HiveMetaStoreClient. This returns the list of partitions matching a specified partition filter. The filter supports "=", "!=", ">", "<", ">=", "<=" and "LIKE" operations on partition keys of type string. "AND" and "OR" logical operations are supported in the filter. So for example, for a table having partition keys country and state, the filter can be 'country = "USA" AND (state = "CA" OR state = "AZ")' > Support partition filtering in metastore > > > Key: HIVE-1609 > URL: https://issues.apache.org/jira/browse/HIVE-1609 > Project: Hadoop Hive > Issue Type: New Feature > Components: Metastore >Reporter: Ajay Kidave > Fix For: 0.7.0 > > Attachments: hive_1609.patch, hive_1609_2.patch > > > The metastore needs to have support for returning a list of partitions based > on user specified filter conditions. This will be useful for tools which need > to do partition pruning. Howl is one such use case. The way partition pruning > is done during hive query execution need not be changed. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1546) Ability to plug custom Semantic Analyzers for Hive Grammar
[ https://issues.apache.org/jira/browse/HIVE-1546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12905665#action_12905665 ] Carl Steinbach commented on HIVE-1546: -- bq. Can we get agreement from the Howl team that even though we're introducing this dependency now, we will not let its existence hinder future semantic analyzer refactoring within Hive? What about other projects that use this feature? How do we get them to agree to this, or how do we prevent them from using it? The new configuration property is documented in hive-default.xml, which implies that it's open to everyone. bq. one possible refinement would be to limit the public interface to just validation (as opposed to full semantic analysis). In that case, we would have HiveStmtValidatorFactory producing HiveStmtValidator with just a single method validate(). This reduces the scope of the dependency, but doesn't eliminate it. Plugins would presumably depend on the structure of the AST that they are trying to validate, which in turn would limit our ability to refactor the grammar or to replace ANTLR with another parser generator. > Ability to plug custom Semantic Analyzers for Hive Grammar > -- > > Key: HIVE-1546 > URL: https://issues.apache.org/jira/browse/HIVE-1546 > Project: Hadoop Hive > Issue Type: Improvement > Components: Metastore >Affects Versions: 0.7.0 >Reporter: Ashutosh Chauhan >Assignee: Ashutosh Chauhan > Fix For: 0.7.0 > > Attachments: hive-1546-3.patch, hive-1546-4.patch, hive-1546.patch, > hive-1546_2.patch > > > It will be useful if Semantic Analysis phase is made pluggable such that > other projects can do custom analysis of hive queries before doing metastore > operations on them. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1546) Ability to plug custom Semantic Analyzers for Hive Grammar
[ https://issues.apache.org/jira/browse/HIVE-1546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12905656#action_12905656 ] John Sichi commented on HIVE-1546: -- For the last sentence, I meant "If Howl's CLI customized behavior is going to need to influence more than just validation" > Ability to plug custom Semantic Analyzers for Hive Grammar > -- > > Key: HIVE-1546 > URL: https://issues.apache.org/jira/browse/HIVE-1546 > Project: Hadoop Hive > Issue Type: Improvement > Components: Metastore >Affects Versions: 0.7.0 >Reporter: Ashutosh Chauhan >Assignee: Ashutosh Chauhan > Fix For: 0.7.0 > > Attachments: hive-1546-3.patch, hive-1546-4.patch, hive-1546.patch, > hive-1546_2.patch > > > It will be useful if Semantic Analysis phase is made pluggable such that > other projects can do custom analysis of hive queries before doing metastore > operations on them. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1546) Ability to plug custom Semantic Analyzers for Hive Grammar
[ https://issues.apache.org/jira/browse/HIVE-1546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12905655#action_12905655 ] John Sichi commented on HIVE-1546: -- @Carl: I understand your concern, but this seemed like the least intrusive approach as opposed to continually patching Hive to refine what Howl's CLI wants to support at a given point in time (which really has nothing to do with Hive). The override approach allows that behavior to be factored completely out into Howl. A number of our existing extensibility interfaces (e.g. StorageHandler) already have similar issues regarding impact from continual refactoring, so I expect an across-the-board SPI stabilization effort to be required in the future (with corresponding migrations from old to new). This will need to be part of that effort. @Ashutosh: I hit the hang you mentioned, so I can retry tests with your latest patch. But let's resolve the approach with Carl first. In particular, can we get agreement from the Howl team that even though we're introducing this dependency now, we will not let its existence hinder future semantic analyzer refactoring within Hive? As long as we all stay in frequent communication, we can make that work. @Both: one possible refinement would be to limit the public interface to just validation (as opposed to full semantic analysis). In that case, we would have HiveStmtValidatorFactory producing HiveStmtValidator with just a single method validate(). This would also remove the unpleasantness of having a factory returning a base class rather than an interface. However, if CLI is going to need to do more than just validation, then this isn't good enough. > Ability to plug custom Semantic Analyzers for Hive Grammar > -- > > Key: HIVE-1546 > URL: https://issues.apache.org/jira/browse/HIVE-1546 > Project: Hadoop Hive > Issue Type: Improvement > Components: Metastore >Affects Versions: 0.7.0 >Reporter: Ashutosh Chauhan >Assignee: Ashutosh Chauhan > Fix For: 0.7.0 > > Attachments: hive-1546-3.patch, hive-1546-4.patch, hive-1546.patch, > hive-1546_2.patch > > > It will be useful if Semantic Analysis phase is made pluggable such that > other projects can do custom analysis of hive queries before doing metastore > operations on them. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1609) Support partition filtering in metastore
[ https://issues.apache.org/jira/browse/HIVE-1609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ajay Kidave updated HIVE-1609: -- Attachment: hive_1609_2.patch Thanks for the review Carl. Javacc is already used in the hive serde code, so it is not a completely new dependency for hive. Javacc has issues with generating proper errors for multi-line inputs, since we are using it for a small filter string only, this issue should not be seen. The build approach is same as taken in serde, i.e the code is regenerated only if javacc.home is defined. Regarding throwing Unknown[DB|Table]Exception, it would require an extra database call to first check whether the database is valid. So I have changed it to throw a NoSuchObjectException saying db.table does not exist if the getMTable operation fails. I have attached a patch which addresses the other issues. > Support partition filtering in metastore > > > Key: HIVE-1609 > URL: https://issues.apache.org/jira/browse/HIVE-1609 > Project: Hadoop Hive > Issue Type: New Feature > Components: Metastore >Reporter: Ajay Kidave > Fix For: 0.7.0 > > Attachments: hive_1609.patch, hive_1609_2.patch > > > The metastore needs to have support for returning a list of partitions based > on user specified filter conditions. This will be useful for tools which need > to do partition pruning. Howl is one such use case. The way partition pruning > is done during hive query execution need not be changed. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1546) Ability to plug custom Semantic Analyzers for Hive Grammar
[ https://issues.apache.org/jira/browse/HIVE-1546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12905640#action_12905640 ] Ashutosh Chauhan commented on HIVE-1546: @Carl, * Ya, the main motivating use case is to provide an alternate DDL CLI tool (hopefully not crippled *smiles*). Reason for that is to enforce certain use-cases on DDL commands in Howl CLI. More details on that are here: http://wiki.apache.org/pig/Howl/HowlCliFuncSpec If you have questions why we are making such decisions in Howl, I will encourage you to post it on howl-dev list and we can discuss it there. howl...@yahoogroups.com * I dont understand what do you mean by making "SemanticAnalyzer a public API". This patch is just letting other tools to do some semantic analysis of the query and then use Hive to do further processing (if tool chooses to do so). Important point here is *other tools*. This in no way enforcing any changes to any Hive behavior. Hive can continue to have its own semantic analyzer and do any sort of semantic analysis of the query. Hive is making no guarantees to any tool. * Hive doesnt care about INPUTDRIVER and OUTPUTDRIVER and neither this patch is asking it to. I dont see any way that its providing any mechanism for defining tables in MetaStore that Hive cant read or write to. @John, Do you want to make me to any further changes or are we good to go? > Ability to plug custom Semantic Analyzers for Hive Grammar > -- > > Key: HIVE-1546 > URL: https://issues.apache.org/jira/browse/HIVE-1546 > Project: Hadoop Hive > Issue Type: Improvement > Components: Metastore >Affects Versions: 0.7.0 >Reporter: Ashutosh Chauhan >Assignee: Ashutosh Chauhan > Fix For: 0.7.0 > > Attachments: hive-1546-3.patch, hive-1546-4.patch, hive-1546.patch, > hive-1546_2.patch > > > It will be useful if Semantic Analysis phase is made pluggable such that > other projects can do custom analysis of hive queries before doing metastore > operations on them. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1610) Using CombinedHiveInputFormat causes partToPartitionInfo IOException
[ https://issues.apache.org/jira/browse/HIVE-1610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sammy Yu updated HIVE-1610: --- Attachment: 0003-HIVE-1610.patch > Using CombinedHiveInputFormat causes partToPartitionInfo IOException > -- > > Key: HIVE-1610 > URL: https://issues.apache.org/jira/browse/HIVE-1610 > Project: Hadoop Hive > Issue Type: Bug > Environment: Hadoop 0.20.2 >Reporter: Sammy Yu > Attachments: > 0002-HIVE-1610.-Added-additional-schema-check-to-doGetPar.patch, > 0003-HIVE-1610.patch > > > I have a relatively complicated hive query using CombinedHiveInputFormat: > set hive.exec.dynamic.partition.mode=nonstrict; > set hive.exec.dynamic.partition=true; > set hive.exec.max.dynamic.partitions=1000; > set hive.exec.max.dynamic.partitions.pernode=300; > set hive.input.format=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat; > INSERT OVERWRITE TABLE keyword_serp_results_no_dups PARTITION(week) select > distinct keywords.keyword, keywords.domain, keywords.url, keywords.rank, > keywords.universal_rank, keywords.serp_type, keywords.date_indexed, > keywords.search_engine_type, keywords.week from keyword_serp_results keywords > JOIN (select domain, keyword, search_engine_type, week, max_date_indexed, > min(rank) as best_rank from (select keywords1.domain, keywords1.keyword, > keywords1.search_engine_type, keywords1.week, keywords1.rank, > dupkeywords1.max_date_indexed from keyword_serp_results keywords1 JOIN > (select domain, keyword, search_engine_type, week, max(date_indexed) as > max_date_indexed from keyword_serp_results group by > domain,keyword,search_engine_type,week) dupkeywords1 on keywords1.keyword = > dupkeywords1.keyword AND keywords1.domain = dupkeywords1.domain AND > keywords1.search_engine_type = dupkeywords1.search_engine_type AND > keywords1.week = dupkeywords1.week AND keywords1.date_indexed = > dupkeywords1.max_date_indexed) dupkeywords2 group by > domain,keyword,search_engine_type,week,max_date_indexed ) dupkeywords3 on > keywords.keyword = dupkeywords3.keyword AND keywords.domain = > dupkeywords3.domain AND keywords.search_engine_type = > dupkeywords3.search_engine_type AND keywords.week = dupkeywords3.week AND > keywords.date_indexed = dupkeywords3.max_date_indexed AND keywords.rank = > dupkeywords3.best_rank; > > This query use to work fine until I updated to r991183 on trunk and started > getting this error: > java.io.IOException: cannot find dir = > hdfs://ec2-75-101-174-245.compute-1.amazonaws.com/tmp/hive-root/hive_2010-09-01_10-57-41_396_1409145025949924904/-mr-10002/00_0 > in > partToPartitionInfo: > [hdfs://ec2-75-101-174-245.compute-1.amazonaws.com:8020/tmp/hive-root/hive_2010-09-01_10-57-41_396_1409145025949924904/-mr-10002, > hdfs://ec2-75-101-174-245.compute-1.amazonaws.com/user/root/domain_keywords/account=417/week=201035/day=20100829, > hdfs://ec2-75-101-174-245.compute-1.amazonaws.com/user/root/domain_keywords/account=418/week=201035/day=20100829, > hdfs://ec2-75-101-174-245.compute-1.amazonaws.com/user/root/domain_keywords/account=419/week=201035/day=20100829, > hdfs://ec2-75-101-174-245.compute-1.amazonaws.com/user/root/domain_keywords/account=422/week=201035/day=20100829, > hdfs://ec2-75-101-174-245.compute-1.amazonaws.com/user/root/domain_keywords/account=422/week=201035/day=20100831] > at > org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getPartitionDescFromPathRecursively(HiveFileFormatUtils.java:277) > at > org.apache.hadoop.hive.ql.io.CombineHiveInputFormat$CombineHiveInputSplit.(CombineHiveInputFormat.java:100) > at > org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getSplits(CombineHiveInputFormat.java:312) > at org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:810) > at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:781) > at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:730) > at org.apache.hadoop.hive.ql.exec.ExecDriver.execute(ExecDriver.java:610) > at org.apache.hadoop.hive.ql.exec.MapRedTask.execute(MapRedTask.java:120) > at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:108) > This query works if I don't change the hive.input.format. > set hive.input.format=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat; > I've narrowed down this issue to the commit for HIVE-1510. If I take out the > changeset from r987746, everything works as before. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1610) Using CombinedHiveInputFormat causes partToPartitionInfo IOException
[ https://issues.apache.org/jira/browse/HIVE-1610?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12905635#action_12905635 ] Sammy Yu commented on HIVE-1610: Yongqiang, thanks for taking a look at this. If I take out the URI scheme checks, the original TestHiveFileFormatUtils.testGetPartitionDescFromPathRecursively test case fails: [junit] Running org.apache.hadoop.hive.ql.io.TestHiveFileFormatUtils [junit] junit.framework.TestListener: tests to run: 2 [junit] junit.framework.TestListener: startTest(testGetPartitionDescFromPathRecursively) [junit] junit.framework.TestListener: addFailure(testGetPartitionDescFromPathRecursively, hdfs:///tbl/par1/part2/part3 should return null expected: but was:) [junit] junit.framework.TestListener: endTest(testGetPartitionDescFromPathRecursively) [junit] junit.framework.TestListener: startTest(testGetPartitionDescFromPathWithPort) [junit] junit.framework.TestListener: endTest(testGetPartitionDescFromPathWithPort) [junit] Tests run: 2, Failures: 1, Errors: 0, Time elapsed: 0.091 sec [junit] Test org.apache.hadoop.hive.ql.io.TestHiveFileFormatUtils FAILED hdfs:///tbl/par1/part2/part3 should not match any PartitionDesc since the path in the map is file:///tbl/par1/part2/part3. I will attach the svn version of the patch shortly. > Using CombinedHiveInputFormat causes partToPartitionInfo IOException > -- > > Key: HIVE-1610 > URL: https://issues.apache.org/jira/browse/HIVE-1610 > Project: Hadoop Hive > Issue Type: Bug > Environment: Hadoop 0.20.2 >Reporter: Sammy Yu > Attachments: > 0002-HIVE-1610.-Added-additional-schema-check-to-doGetPar.patch > > > I have a relatively complicated hive query using CombinedHiveInputFormat: > set hive.exec.dynamic.partition.mode=nonstrict; > set hive.exec.dynamic.partition=true; > set hive.exec.max.dynamic.partitions=1000; > set hive.exec.max.dynamic.partitions.pernode=300; > set hive.input.format=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat; > INSERT OVERWRITE TABLE keyword_serp_results_no_dups PARTITION(week) select > distinct keywords.keyword, keywords.domain, keywords.url, keywords.rank, > keywords.universal_rank, keywords.serp_type, keywords.date_indexed, > keywords.search_engine_type, keywords.week from keyword_serp_results keywords > JOIN (select domain, keyword, search_engine_type, week, max_date_indexed, > min(rank) as best_rank from (select keywords1.domain, keywords1.keyword, > keywords1.search_engine_type, keywords1.week, keywords1.rank, > dupkeywords1.max_date_indexed from keyword_serp_results keywords1 JOIN > (select domain, keyword, search_engine_type, week, max(date_indexed) as > max_date_indexed from keyword_serp_results group by > domain,keyword,search_engine_type,week) dupkeywords1 on keywords1.keyword = > dupkeywords1.keyword AND keywords1.domain = dupkeywords1.domain AND > keywords1.search_engine_type = dupkeywords1.search_engine_type AND > keywords1.week = dupkeywords1.week AND keywords1.date_indexed = > dupkeywords1.max_date_indexed) dupkeywords2 group by > domain,keyword,search_engine_type,week,max_date_indexed ) dupkeywords3 on > keywords.keyword = dupkeywords3.keyword AND keywords.domain = > dupkeywords3.domain AND keywords.search_engine_type = > dupkeywords3.search_engine_type AND keywords.week = dupkeywords3.week AND > keywords.date_indexed = dupkeywords3.max_date_indexed AND keywords.rank = > dupkeywords3.best_rank; > > This query use to work fine until I updated to r991183 on trunk and started > getting this error: > java.io.IOException: cannot find dir = > hdfs://ec2-75-101-174-245.compute-1.amazonaws.com/tmp/hive-root/hive_2010-09-01_10-57-41_396_1409145025949924904/-mr-10002/00_0 > in > partToPartitionInfo: > [hdfs://ec2-75-101-174-245.compute-1.amazonaws.com:8020/tmp/hive-root/hive_2010-09-01_10-57-41_396_1409145025949924904/-mr-10002, > hdfs://ec2-75-101-174-245.compute-1.amazonaws.com/user/root/domain_keywords/account=417/week=201035/day=20100829, > hdfs://ec2-75-101-174-245.compute-1.amazonaws.com/user/root/domain_keywords/account=418/week=201035/day=20100829, > hdfs://ec2-75-101-174-245.compute-1.amazonaws.com/user/root/domain_keywords/account=419/week=201035/day=20100829, > hdfs://ec2-75-101-174-245.compute-1.amazonaws.com/user/root/domain_keywords/account=422/week=201035/day=20100829, > hdfs://ec2-75-101-174-245.compute-1.amazonaws.com/user/root/domain_keywords/account=422/week=201035/day=20100831] > at > org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getPartitionDescFromPathRecursively(HiveFileFormatUtils.java:277) > at > org.apache.hadoop.hive.ql.io.CombineHiveInputFormat$CombineHiveInputSplit.(CombineHiveInputFormat.java:100) > at > org.apache.hadoop.hive.ql.io.CombineHiveIn
[jira] Commented: (HIVE-1476) Hive's metastore when run as a thrift service creates directories as the service user instead of the real user issuing create table/alter table etc.
[ https://issues.apache.org/jira/browse/HIVE-1476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12905622#action_12905622 ] Carl Steinbach commented on HIVE-1476: -- @Venkatesh: THRIFT-814 covers adding SPNEGO support to Thrift. > Hive's metastore when run as a thrift service creates directories as the > service user instead of the real user issuing create table/alter table etc. > > > Key: HIVE-1476 > URL: https://issues.apache.org/jira/browse/HIVE-1476 > Project: Hadoop Hive > Issue Type: Bug >Affects Versions: 0.6.0, 0.7.0 >Reporter: Pradeep Kamath > Attachments: HIVE-1476.patch, HIVE-1476.patch.2 > > > If the thrift metastore service is running as the user "hive" then all table > directories as a result of create table are created as that user rather than > the user who actually issued the create table command. This is different > semantically from non-thrift mode (i.e. local mode) when clients directly > connect to the metastore. In the latter case, directories are created as the > real user. The thrift mode should do the same. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1476) Hive's metastore when run as a thrift service creates directories as the service user instead of the real user issuing create table/alter table etc.
[ https://issues.apache.org/jira/browse/HIVE-1476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12905623#action_12905623 ] Carl Steinbach commented on HIVE-1476: -- Edit: I mean THRIFT-889. > Hive's metastore when run as a thrift service creates directories as the > service user instead of the real user issuing create table/alter table etc. > > > Key: HIVE-1476 > URL: https://issues.apache.org/jira/browse/HIVE-1476 > Project: Hadoop Hive > Issue Type: Bug >Affects Versions: 0.6.0, 0.7.0 >Reporter: Pradeep Kamath > Attachments: HIVE-1476.patch, HIVE-1476.patch.2 > > > If the thrift metastore service is running as the user "hive" then all table > directories as a result of create table are created as that user rather than > the user who actually issued the create table command. This is different > semantically from non-thrift mode (i.e. local mode) when clients directly > connect to the metastore. In the latter case, directories are created as the > real user. The thrift mode should do the same. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (HIVE-1611) Add alternative search-provider to Hive site
[ https://issues.apache.org/jira/browse/HIVE-1611?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] John Sichi reassigned HIVE-1611: Assignee: Alex Baranau > Add alternative search-provider to Hive site > > > Key: HIVE-1611 > URL: https://issues.apache.org/jira/browse/HIVE-1611 > Project: Hadoop Hive > Issue Type: Improvement >Reporter: Alex Baranau >Assignee: Alex Baranau >Priority: Minor > Attachments: HIVE-1611.patch > > > Use search-hadoop.com service to make available search in Hive sources, MLs, > wiki, etc. > This was initially proposed on user mailing list. The search service was > already added in site's skin (common for all Hadoop related projects) before > so this issue is about enabling it for Hive. The ultimate goal is to use it > at all Hadoop's sub-projects' sites. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1610) Using CombinedHiveInputFormat causes partToPartitionInfo IOException
[ https://issues.apache.org/jira/browse/HIVE-1610?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12905594#action_12905594 ] He Yongqiang commented on HIVE-1610: 1. just remove {noformat} && (dir.toUri().getScheme() == null || dir.toUri().getScheme().trim() .equals("")) {noformat} will make things work. 2. you need to use svn (not git) to generate the patch. > Using CombinedHiveInputFormat causes partToPartitionInfo IOException > -- > > Key: HIVE-1610 > URL: https://issues.apache.org/jira/browse/HIVE-1610 > Project: Hadoop Hive > Issue Type: Bug > Environment: Hadoop 0.20.2 >Reporter: Sammy Yu > Attachments: > 0002-HIVE-1610.-Added-additional-schema-check-to-doGetPar.patch > > > I have a relatively complicated hive query using CombinedHiveInputFormat: > set hive.exec.dynamic.partition.mode=nonstrict; > set hive.exec.dynamic.partition=true; > set hive.exec.max.dynamic.partitions=1000; > set hive.exec.max.dynamic.partitions.pernode=300; > set hive.input.format=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat; > INSERT OVERWRITE TABLE keyword_serp_results_no_dups PARTITION(week) select > distinct keywords.keyword, keywords.domain, keywords.url, keywords.rank, > keywords.universal_rank, keywords.serp_type, keywords.date_indexed, > keywords.search_engine_type, keywords.week from keyword_serp_results keywords > JOIN (select domain, keyword, search_engine_type, week, max_date_indexed, > min(rank) as best_rank from (select keywords1.domain, keywords1.keyword, > keywords1.search_engine_type, keywords1.week, keywords1.rank, > dupkeywords1.max_date_indexed from keyword_serp_results keywords1 JOIN > (select domain, keyword, search_engine_type, week, max(date_indexed) as > max_date_indexed from keyword_serp_results group by > domain,keyword,search_engine_type,week) dupkeywords1 on keywords1.keyword = > dupkeywords1.keyword AND keywords1.domain = dupkeywords1.domain AND > keywords1.search_engine_type = dupkeywords1.search_engine_type AND > keywords1.week = dupkeywords1.week AND keywords1.date_indexed = > dupkeywords1.max_date_indexed) dupkeywords2 group by > domain,keyword,search_engine_type,week,max_date_indexed ) dupkeywords3 on > keywords.keyword = dupkeywords3.keyword AND keywords.domain = > dupkeywords3.domain AND keywords.search_engine_type = > dupkeywords3.search_engine_type AND keywords.week = dupkeywords3.week AND > keywords.date_indexed = dupkeywords3.max_date_indexed AND keywords.rank = > dupkeywords3.best_rank; > > This query use to work fine until I updated to r991183 on trunk and started > getting this error: > java.io.IOException: cannot find dir = > hdfs://ec2-75-101-174-245.compute-1.amazonaws.com/tmp/hive-root/hive_2010-09-01_10-57-41_396_1409145025949924904/-mr-10002/00_0 > in > partToPartitionInfo: > [hdfs://ec2-75-101-174-245.compute-1.amazonaws.com:8020/tmp/hive-root/hive_2010-09-01_10-57-41_396_1409145025949924904/-mr-10002, > hdfs://ec2-75-101-174-245.compute-1.amazonaws.com/user/root/domain_keywords/account=417/week=201035/day=20100829, > hdfs://ec2-75-101-174-245.compute-1.amazonaws.com/user/root/domain_keywords/account=418/week=201035/day=20100829, > hdfs://ec2-75-101-174-245.compute-1.amazonaws.com/user/root/domain_keywords/account=419/week=201035/day=20100829, > hdfs://ec2-75-101-174-245.compute-1.amazonaws.com/user/root/domain_keywords/account=422/week=201035/day=20100829, > hdfs://ec2-75-101-174-245.compute-1.amazonaws.com/user/root/domain_keywords/account=422/week=201035/day=20100831] > at > org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getPartitionDescFromPathRecursively(HiveFileFormatUtils.java:277) > at > org.apache.hadoop.hive.ql.io.CombineHiveInputFormat$CombineHiveInputSplit.(CombineHiveInputFormat.java:100) > at > org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getSplits(CombineHiveInputFormat.java:312) > at org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:810) > at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:781) > at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:730) > at org.apache.hadoop.hive.ql.exec.ExecDriver.execute(ExecDriver.java:610) > at org.apache.hadoop.hive.ql.exec.MapRedTask.execute(MapRedTask.java:120) > at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:108) > This query works if I don't change the hive.input.format. > set hive.input.format=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat; > I've narrowed down this issue to the commit for HIVE-1510. If I take out the > changeset from r987746, everything works as before. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1611) Add alternative search-provider to Hive site
[ https://issues.apache.org/jira/browse/HIVE-1611?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alex Baranau updated HIVE-1611: --- Status: Patch Available (was: Open) > Add alternative search-provider to Hive site > > > Key: HIVE-1611 > URL: https://issues.apache.org/jira/browse/HIVE-1611 > Project: Hadoop Hive > Issue Type: Improvement >Reporter: Alex Baranau >Priority: Minor > Attachments: HIVE-1611.patch > > > Use search-hadoop.com service to make available search in Hive sources, MLs, > wiki, etc. > This was initially proposed on user mailing list. The search service was > already added in site's skin (common for all Hadoop related projects) before > so this issue is about enabling it for Hive. The ultimate goal is to use it > at all Hadoop's sub-projects' sites. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1611) Add alternative search-provider to Hive site
[ https://issues.apache.org/jira/browse/HIVE-1611?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alex Baranau updated HIVE-1611: --- Attachment: HIVE-1611.patch Attached patch which enables search-hadoop search service for site > Add alternative search-provider to Hive site > > > Key: HIVE-1611 > URL: https://issues.apache.org/jira/browse/HIVE-1611 > Project: Hadoop Hive > Issue Type: Improvement >Reporter: Alex Baranau >Priority: Minor > Attachments: HIVE-1611.patch > > > Use search-hadoop.com service to make available search in Hive sources, MLs, > wiki, etc. > This was initially proposed on user mailing list. The search service was > already added in site's skin (common for all Hadoop related projects) before > so this issue is about enabling it for Hive. The ultimate goal is to use it > at all Hadoop's sub-projects' sites. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HIVE-1611) Add alternative search-provider to Hive site
Add alternative search-provider to Hive site Key: HIVE-1611 URL: https://issues.apache.org/jira/browse/HIVE-1611 Project: Hadoop Hive Issue Type: Improvement Reporter: Alex Baranau Priority: Minor Use search-hadoop.com service to make available search in Hive sources, MLs, wiki, etc. This was initially proposed on user mailing list. The search service was already added in site's skin (common for all Hadoop related projects) before so this issue is about enabling it for Hive. The ultimate goal is to use it at all Hadoop's sub-projects' sites. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1476) Hive's metastore when run as a thrift service creates directories as the service user instead of the real user issuing create table/alter table etc.
[ https://issues.apache.org/jira/browse/HIVE-1476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12905470#action_12905470 ] Venkatesh S commented on HIVE-1476: --- @Todd, Thrift over HTTP transport (THRIFT-814) can use kerberos over SPNEGO. > Hive's metastore when run as a thrift service creates directories as the > service user instead of the real user issuing create table/alter table etc. > > > Key: HIVE-1476 > URL: https://issues.apache.org/jira/browse/HIVE-1476 > Project: Hadoop Hive > Issue Type: Bug >Affects Versions: 0.6.0, 0.7.0 >Reporter: Pradeep Kamath > Attachments: HIVE-1476.patch, HIVE-1476.patch.2 > > > If the thrift metastore service is running as the user "hive" then all table > directories as a result of create table are created as that user rather than > the user who actually issued the create table command. This is different > semantically from non-thrift mode (i.e. local mode) when clients directly > connect to the metastore. In the latter case, directories are created as the > real user. The thrift mode should do the same. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1539) Concurrent metastore threading problem
[ https://issues.apache.org/jira/browse/HIVE-1539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bennie Schut updated HIVE-1539: --- Attachment: ClassLoaderResolver.patch Ok still testing it but this is a temporary fix we add our own sync. version of the classloader: Just make sure you add these properties and it should work: datanucleus.classLoaderResolverName syncloader > Concurrent metastore threading problem > --- > > Key: HIVE-1539 > URL: https://issues.apache.org/jira/browse/HIVE-1539 > Project: Hadoop Hive > Issue Type: Bug > Components: Metastore >Affects Versions: 0.7.0 >Reporter: Bennie Schut >Assignee: Bennie Schut > Attachments: ClassLoaderResolver.patch, thread_dump_hanging.txt > > > When running hive as a service and running a high number of queries > concurrently I end up with multiple threads running at 100% cpu without any > progress. > Looking at these threads I notice this thread(484e): > at > org.apache.hadoop.hive.metastore.ObjectStore.getMTable(ObjectStore.java:598) > But on a different thread(63a2): > at > org.apache.hadoop.hive.metastore.model.MStorageDescriptor.jdoReplaceField(MStorageDescriptor.java) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1539) Concurrent metastore threading problem
[ https://issues.apache.org/jira/browse/HIVE-1539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12905455#action_12905455 ] Bennie Schut commented on HIVE-1539: JDOClassLoaderResolver doesn't seem thread safe. That's a bit of a surprise. I filed a bug with datanucleus: http://www.datanucleus.org/servlet/jira/browse/NUCCORE-559 I just made my own threadsafe version of the JDOClassLoaderResolver and am loading it to see if that fixes it. Will probably take a few days to be sure it got fixed. > Concurrent metastore threading problem > --- > > Key: HIVE-1539 > URL: https://issues.apache.org/jira/browse/HIVE-1539 > Project: Hadoop Hive > Issue Type: Bug > Components: Metastore >Affects Versions: 0.7.0 >Reporter: Bennie Schut >Assignee: Bennie Schut > Attachments: thread_dump_hanging.txt > > > When running hive as a service and running a high number of queries > concurrently I end up with multiple threads running at 100% cpu without any > progress. > Looking at these threads I notice this thread(484e): > at > org.apache.hadoop.hive.metastore.ObjectStore.getMTable(ObjectStore.java:598) > But on a different thread(63a2): > at > org.apache.hadoop.hive.metastore.model.MStorageDescriptor.jdoReplaceField(MStorageDescriptor.java) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (HIVE-849) .. not supported
[ https://issues.apache.org/jira/browse/HIVE-849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carl Steinbach resolved HIVE-849. - Resolution: Duplicate > .. not supported > > > Key: HIVE-849 > URL: https://issues.apache.org/jira/browse/HIVE-849 > Project: Hadoop Hive > Issue Type: New Feature >Reporter: Namit Jain >Assignee: He Yongqiang > -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.