Re: [PROPOSAL] Hivemall incubation

Makoto Yui Fri, 21 Nov 2014 11:51:10 -0800

Hi Nick,

Thank you for the comments.


(2014/11/22 3:42), Nick Dimiduk wrote:

I would also encourage you to consider joining forces with DataFu,
rather than "competing". I think there's a real appetite a wholistic
toolbox of patterns and implementations that can span these projects.
 From my understanding, there's nothing about DataFu that's unique to
Pig, they just need the work done to abstract away the Pig bits and
implement the Hive interfaces.

My current understanding of DataFu is that it is UDF collections forApache Pig. Though Hive interface is not yet supported in DataFu, is thedirection (to extend DataFu for Hive) a consensus in DataFu community?

My concern is that merging Hivemall codebase to DataFu makes thebuilding and packing process of DataFu complex and the target/objectiveof the project unclear.


I do not think that Hivemall competes with DataFu because
1) There are users who prefer Pig and Hive respectively, and

2) Pig/DataFu is useful for what HiveQL is unsuited (e.g., complexfeature engineering steps). After preprocessing using DataFu, Hivemallcan be applied for classification/regression in a scalable way in Hive.

Is there anything about Hivemall that's unique to Hive, that wouldn't be
applicable to Pig as well?

The techniques used in Hivemall (e.g., training data amplification thatemulates iterative training and machine learning algorithms astable-generating functions) could be appreciable to Apache Pig.

However, I am not a heavy user of Pig and porting Hivemall to Pigrequires a bunch of works. So, I am currently considering to stick withHiveQL interfaces (Hive, HCatalog, and Tez for the software stack ofHivemall) in developing Hivemall because SQL-like interface is friendlyto a broader range of developers.


Thanks,
Makoto

--
*******************************************
Makoto YUI <m....@aist.go.jp>
Information Technology Research Institute, AIST.
https://staff.aist.go.jp/m.yui/index_e.html
*******************************************

Re: [PROPOSAL] Hivemall incubation

Reply via email to