[jira] Commented: (HIVE-1402) Add parallel ORDER BY to Hive

2010-06-18 Thread Jeff Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12880120#action_12880120
 ] 

Jeff Zhang commented on HIVE-1402:
--

Hi, I make a draft implementation for one special case. And it works, but since 
it is only for one special case, so I have some hard coding. I hope someone can 
give some help or instruction for the next step. 
One big problem of parallel ORDER BY is that the output  key type of ExecMapper 
is HiveKey, and it has been serialized by LazyBinarySerDe, so the original 
column type is lost here. But when do sampling and partition, I should use the 
original column type.

The following is my initial design.

1. During parse stage, extract one SampleOperator which has two children: 
TableScanOperator, SelectOperator ( I am not familiar with Hive Parse Stage, 
and the code is not clear for me, could anyone give some help or recommend some 
documentation about the Hive parser ? )

2. Modify the TotalOrderPartitioner.  Add a Deserializer to convert the HiveKey 
to its original column type. and deserialie the HiveKey in method 
getPartition(). 

Welcome any comments and help.



> Add parallel ORDER BY to Hive
> -
>
> Key: HIVE-1402
> URL: https://issues.apache.org/jira/browse/HIVE-1402
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Affects Versions: 0.5.0
>Reporter: Jeff Hammerbacher
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: Is anybody working on the globally "order by" of hive ?

2010-06-11 Thread Jeff Zhang
Great, I can work on this issue.




On Sat, Jun 12, 2010 at 2:02 PM, Jeff Hammerbacher  wrote:
> See https://issues.apache.org/jira/browse/HIVE-1402.
>
> On Fri, Jun 11, 2010 at 1:22 PM, John Sichi  wrote:
>
>> If someone is interested in adding parallel ORDER BY to Hive (using
>> TotalOrderPartitioner), here's a good starting point:
>>
>> http://wiki.apache.org/hadoop/Hive/HBaseBulkLoad
>>
>> The goal would be to take that manual two-step sample-then-sort process and
>> turn it into an automatic plan within Hive.  I have a better example for the
>> sampling query which I haven't published yet.
>>
>> We would also need to name the final output files in such a way that the
>> total order could be iterated via the filenames.
>>
>



-- 
Best Regards

Jeff Zhang


Is anybody working on the globally "order by" of hive ?

2010-06-11 Thread Jeff Zhang
Hi all,

>From the wiki of hive, Hive do not have the feature of globally "order
by", the sort by of hive is for each reducer. Our team think the
globally "order by" is an important feature for users, so wondering is
anybody working it ? I am very interested to been involved.


-- 
Best Regards

Jeff Zhang


Suggest Hive to provide a simple java api for unit test in local mode

2010-05-13 Thread Jeff Zhang
Hi all,

I'd like to debug hive program, and want to use the raw java api of
Hive. I know that there's thrift api for hive, but it's not convenient
for me especially for debugging and unit test.
And I notice that the unit test (TestExecDriver) in hive use shim
(call ExecDriver in another process) which is also not convenient for
testing, I did some hacking and finally the following code can execute
successfully ( I create the table before the execution).  so I'd like
to suggest maybe Hive should provide a more simple java api for
user(wrapper based on the Hive internal Java api)  and allow user to
choose not using shim but directly using ExecDriver.

/ code snippet

HiveConf conf = new HiveConf(ExecDriver.class);
Driver driver = new Driver(conf);
driver.compile("select name from test group by name");
QueryPlan plan = Utilities.deserializeQueryPlan(new
FileInputStream(driver.ctx
.getLocalScratchDir().substring(5)
+ File.separator + "queryplan.xml"), conf);
Task task=plan.getRootTasks().get(0);
ExecDriver eDriver = new ExecDriver((MapredWork)task.getWork(),
new JobConf(), false);
eDriver.execute(new DriverContext());

-- 
Best Regards

Jeff Zhang


Re: Will hive support PL/SQL?

2010-02-08 Thread Jeff Zhang
I think PL/SQL is one kind of DSL upon SQL, so it is possible to build
similar things upon HSQL. Ruby DSL may be one option


On Mon, Feb 8, 2010 at 4:20 PM, Mafish Liu  wrote:

> Hive supports SQL-like language named HSQL.
>
> Refer http://wiki.apache.org/hadoop/Hive/LanguageManual for more details.
>
> 2010/2/8 jian yi :
> > Hi all,
> >
> >PL/SQL is very convenient, will hive support it?
> >
> > Regards
> > Jian Yi
> >
>
>
>
> --
> maf...@gmail.com
>



-- 
Best Regards

Jeff Zhang


Where can I start ?

2009-09-08 Thread Jeff Zhang
Hi all,

I has used Pig for several months. And now I am very interested in Hive, and
would like to investigate Hive and compare these two amazing products.

Also I hope I can contribute to Hive. And when I learn Pig, I always write a
pig script, and debug it in local mode, then I can see the internal
mechanisms, but it looks like Hive do not have straight Java API like Pig.
So How can I learn Hive as a developer not as an user ? Could anyone give me
some suggestions ?


Thank you.

Jeff zhang