[jira] Commented: (HIVE-1402) Add parallel ORDER BY to Hive
[ https://issues.apache.org/jira/browse/HIVE-1402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12880120#action_12880120 ] Jeff Zhang commented on HIVE-1402: -- Hi, I make a draft implementation for one special case. And it works, but since it is only for one special case, so I have some hard coding. I hope someone can give some help or instruction for the next step. One big problem of parallel ORDER BY is that the output key type of ExecMapper is HiveKey, and it has been serialized by LazyBinarySerDe, so the original column type is lost here. But when do sampling and partition, I should use the original column type. The following is my initial design. 1. During parse stage, extract one SampleOperator which has two children: TableScanOperator, SelectOperator ( I am not familiar with Hive Parse Stage, and the code is not clear for me, could anyone give some help or recommend some documentation about the Hive parser ? ) 2. Modify the TotalOrderPartitioner. Add a Deserializer to convert the HiveKey to its original column type. and deserialie the HiveKey in method getPartition(). Welcome any comments and help. > Add parallel ORDER BY to Hive > - > > Key: HIVE-1402 > URL: https://issues.apache.org/jira/browse/HIVE-1402 > Project: Hadoop Hive > Issue Type: New Feature > Components: Query Processor >Affects Versions: 0.5.0 >Reporter: Jeff Hammerbacher > -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: Is anybody working on the globally "order by" of hive ?
Great, I can work on this issue. On Sat, Jun 12, 2010 at 2:02 PM, Jeff Hammerbacher wrote: > See https://issues.apache.org/jira/browse/HIVE-1402. > > On Fri, Jun 11, 2010 at 1:22 PM, John Sichi wrote: > >> If someone is interested in adding parallel ORDER BY to Hive (using >> TotalOrderPartitioner), here's a good starting point: >> >> http://wiki.apache.org/hadoop/Hive/HBaseBulkLoad >> >> The goal would be to take that manual two-step sample-then-sort process and >> turn it into an automatic plan within Hive. I have a better example for the >> sampling query which I haven't published yet. >> >> We would also need to name the final output files in such a way that the >> total order could be iterated via the filenames. >> > -- Best Regards Jeff Zhang
Is anybody working on the globally "order by" of hive ?
Hi all, >From the wiki of hive, Hive do not have the feature of globally "order by", the sort by of hive is for each reducer. Our team think the globally "order by" is an important feature for users, so wondering is anybody working it ? I am very interested to been involved. -- Best Regards Jeff Zhang
Suggest Hive to provide a simple java api for unit test in local mode
Hi all, I'd like to debug hive program, and want to use the raw java api of Hive. I know that there's thrift api for hive, but it's not convenient for me especially for debugging and unit test. And I notice that the unit test (TestExecDriver) in hive use shim (call ExecDriver in another process) which is also not convenient for testing, I did some hacking and finally the following code can execute successfully ( I create the table before the execution). so I'd like to suggest maybe Hive should provide a more simple java api for user(wrapper based on the Hive internal Java api) and allow user to choose not using shim but directly using ExecDriver. / code snippet HiveConf conf = new HiveConf(ExecDriver.class); Driver driver = new Driver(conf); driver.compile("select name from test group by name"); QueryPlan plan = Utilities.deserializeQueryPlan(new FileInputStream(driver.ctx .getLocalScratchDir().substring(5) + File.separator + "queryplan.xml"), conf); Task task=plan.getRootTasks().get(0); ExecDriver eDriver = new ExecDriver((MapredWork)task.getWork(), new JobConf(), false); eDriver.execute(new DriverContext()); -- Best Regards Jeff Zhang
Re: Will hive support PL/SQL?
I think PL/SQL is one kind of DSL upon SQL, so it is possible to build similar things upon HSQL. Ruby DSL may be one option On Mon, Feb 8, 2010 at 4:20 PM, Mafish Liu wrote: > Hive supports SQL-like language named HSQL. > > Refer http://wiki.apache.org/hadoop/Hive/LanguageManual for more details. > > 2010/2/8 jian yi : > > Hi all, > > > >PL/SQL is very convenient, will hive support it? > > > > Regards > > Jian Yi > > > > > > -- > maf...@gmail.com > -- Best Regards Jeff Zhang
Where can I start ?
Hi all, I has used Pig for several months. And now I am very interested in Hive, and would like to investigate Hive and compare these two amazing products. Also I hope I can contribute to Hive. And when I learn Pig, I always write a pig script, and debug it in local mode, then I can see the internal mechanisms, but it looks like Hive do not have straight Java API like Pig. So How can I learn Hive as a developer not as an user ? Could anyone give me some suggestions ? Thank you. Jeff zhang