On Sep 1, 2012, at 4:38 PM, Renato Marroquín Mogrovejo wrote:

> Hi Travis,
> 
> Thanks a ton for this issue I know I will enjoy solving this (: So I
> have some questions about this jira even though I think I understand
> what the problem is.
> 
> - How do you think I should approach this? I mean if HCat can't send
> the partitions' information through the configuration object, maybe we
> should think on a different way of communicating this information
> (thrift, or the database)?
Thrift or the database aren't options.  You can't count on being able to 
communicate with the client from the map tasks, not to mention you would 
overwhelm the client.  One of the rules of hcat is the map and reduce tasks 
should never talk to the database, as it isn't sized to handle large numbers of 
tasks talking to it.

My first thought would be to use the distributed cache.  You should only use 
this option when you have a very large number of files.  But in that case write 
them to a file, put that file in the distributed cache, and then put a pointer 
to that in the job conf instead of the file list.

Alan.

> - I was looking at HCatLoader but I am not sue if this would be a good
> entry point for the modifications. Any suggestions?
> 
> Thanks again Travis!
> 
> 
> Renato M.
> 
> 
> 2012/8/30 Travis Crawford <[email protected]>:
>> You might be interested in https://issues.apache.org/jira/browse/HCATALOG-453
>> 
>> The issue here is HCatalog queries the HiveMetaStore for info about
>> the partitions to process, and stores that response in the job conf.
>> When processing large numbers of partitions this bloats the job conf
>> beyond what Hadoop will allow and the job fails.
>> 
>> What's interesting about this issue is you'll learn about the main
>> feature of HCatalog - translating db+table+partition_spec into a list
>> of partitions, how HCat handles that internally, and how its
>> communicated between the frontend & backend. The actual issue is
>> straightforward, but I think spending the time to understand the
>> problem will give a great overview of how HCat works.
>> 
>> Thoughts?
>> 
>> --travis
>> 
>> 
>> 
>> On Thu, Aug 30, 2012 at 4:25 PM, Renato Marroquín Mogrovejo
>> <[email protected]> wrote:
>>> Travis,
>>> 
>>> Thanks a lot for your response! My master's dissertation was about
>>> using statistics to smarten up Apache Pig rule optimizer, so I would
>>> love to help out with something related, but maybe you can suggest me
>>> some interesting jiras (not complicated ones but maybe "noobies" ones)
>>> I can start with (:
>>> And yeah the labels thing is much better than creating a jura type for
>>> noobies. Thanks again!
>>> 
>>> 
>>> Renato M.
>>> 
>>> 2012/8/30 Travis Crawford <[email protected]>:
>>>> Hey Renato -
>>>> 
>>>> Awesome! What in particular are you interested in starting out with?
>>>> We can definitely find a starter project for you in that area.
>>>> 
>>>> JIRA issues can have a variety of attributes; the attribute I started
>>>> this thread about is the "issue type".
>>>> 
>>>> JIRA also has "labels", which I think are a great place to indicate
>>>> something would be good for noobies. For example, there could be an
>>>> "issue type" of bug, with "label" noobie.
>>>> 
>>>> Let us know what area you're interested in diving into and we can help
>>>> come up with a starter project for ya.
>>>> 
>>>> --travis
>>>> 
>>>> 
>>>> On Thu, Aug 30, 2012 at 9:21 AM, Renato Marroquín Mogrovejo
>>>> <[email protected]> wrote:
>>>>> Hi all,
>>>>> 
>>>>> I am new to HCatalog but I would like to get involved with the
>>>>> project, and one thing that would totally help is to create an issue
>>>>> type that indicates it is for "newbies". I saw that in Apache Pig they
>>>>> have a special type of issue for this and with this they try to engage
>>>>> more with the community. This would be awesome guys!
>>>>> Thanks in advance!
>>>>> 
>>>>> 
>>>>> Renato M.
>>>>> 
>>>>> 2012/8/30 Travis Crawford <[email protected]>:
>>>>>> Hey hcat gurus -
>>>>>> 
>>>>>> Filing an issue just now I noticed the list of possible option types
>>>>>> is pretty crazy long - any objection to requesting a simplification
>>>>>> to:
>>>>>> 
>>>>>> PROPOSED ISSUE TYPES:
>>>>>> 
>>>>>> Bug - fixing unintended behavior
>>>>>> New Feature - addition of brand-new functionality
>>>>>> Improvement - making existing functionality better
>>>>>> 
>>>>>> CURRENT ISSUE TYPES:
>>>>>> 
>>>>>> Bug
>>>>>> New Feature
>>>>>> Improvement
>>>>>> Test
>>>>>> Wish
>>>>>> Task
>>>>>> New JIRA Project
>>>>>> RTC
>>>>>> TCK Challenge
>>>>>> Question
>>>>>> Temp
>>>>>> Brainstorming
>>>>>> Umbrella
>>>>>> Epic
>>>>>> Dependency upgrade
>>>>>> Suitable Name Search
>>>>>> 
>>>>>> If this sounds good I'll ping the infra folks and try to make this 
>>>>>> happen.
>>>>>> 
>>>>>> --travis

Reply via email to