On Thu, Nov 15, 2012 at 8:14 AM, Victor Iacoban <[email protected]> wrote: > I'm not a clojure wizard myself but it feels like clojure REPL with crunch > would be a terrific experimentation environment. > > I've tried crunch from java and I was impressed, it's very easy to connect > non-standard sources and reasonable easy to define the flow. > > I tried to use cascalog for my prototyping env but although it's very good > on flow definition, cascading lacks a lot in flexibility when you need to > process something else except for text or sequesnce files. > > "clunch" sounds like a good name to me ;)
LOL. "Clutch" has a nice ring to it. ;-) > > -- victor > > > On Thu, Nov 15, 2012 at 10:58 AM, Joseph Adler <[email protected]>wrote: > >> Personally, I'd love to see Crunch mixed with Clojure. I was thinking about >> this myself, but I'd rather see someone who really knows Clojure take this >> on. >> >> Just don't call it Clunch. >> >> -- Joe >> >> >> On Thu, Nov 15, 2012 at 5:04 AM, Victor Iacoban <[email protected] >> >wrote: >> >> > Thanks Josh, will give this a try >> > >> > >> > On Wed, Nov 14, 2012 at 9:54 PM, Josh Wills <[email protected]> >> wrote: >> > >> > > I'm always glad to help people to extend Crunch in ways that are useful >> > for >> > > them. I think that most things that involve type-related extensions can >> > be >> > > handled using the PTypes.derived() function, which can be used to >> create >> > > custom PTypes that are mapped to underlying serialized types, so that >> you >> > > could do something like >> > > >> > > // Forgive my syntax errors, I'm doing this w/o an IDE >> > > PType<Object> objectType = PTypes.derived(Object.class, new >> > > InputMapFn<BytesWritable, Object>(), new OutputMapFn<Object, >> > > BytesWritable>(), Writables.writables(BytesWritable.class)); >> > > >> > > ...which is essentially how Scrunch works: the PTypes { } functionality >> > in >> > > Scrunch maps from Scala types to Java types using the derived >> > > functionality. >> > > >> > > The Converter stuff is internal to Avro and Writable, I can't think of >> a >> > > case where that would need to be exposed outside the package (i.e., >> once >> > > you've decided on whether to use Writables or Avro as your >> serialization >> > > framework, the choice of Converter is fixed.) >> > > >> > > If you have a use case where the derived type can't handle the >> conversion >> > > or is a poor choice for whatever reason, I'm all about having a >> > discussion >> > > and trying out different designs. >> > > >> > > Josh >> > > >> > > >> > > On Wed, Nov 14, 2012 at 6:18 PM, Victor Iacoban < >> > [email protected] >> > > >wrote: >> > > >> > > > Hi, >> > > > >> > > > I'm very interested in writing a wrapper library around Apache Crunch >> > for >> > > > Clojure, something similar to existing Scrunch. >> > > > How do you recommend to start? >> > > > >> > > > I was looking through Crunch code and it looks like I can pretty >> easily >> > > > integrate it in clojure by adding some custom WritableType type. >> > > > Something like WritableType<Object, ByteWritable> with a custom >> > converter >> > > > or inputFn/outputFn functions. >> > > > >> > > > Regretfully there are several issues with this approach and instead >> I'd >> > > > have to duplicate all those type classes for a new type set >> > > > * WritableType has a package visible constructor so I cannot extend >> it >> > > and >> > > > cannot instantiate it >> > > > * Converter is instantiated inside WritableType constructor so in >> case >> > I >> > > > need a different converter I'm stuck >> > > > * Writables has a factory method for WritableType but it's private >> > > > * it looks like there is an attempt to support additional >> WritableTypes >> > > > through EXTENSIONS in Writables but it would only work for cases >> where >> > in >> > > > WritableType<T, W> both T and W are hadoop writables >> > > > >> > > > So what do you think is a best solution, is it possible to open up >> the >> > > api >> > > > to support custom WritableTypes or the only option for me is to >> > > implement a >> > > > new ClojurePType and all related classes? >> > > > >> > > > Hope I'm not too detailed, but at this stage you all are probably >> very >> > > > familiar with the code >> > > > >> > > > Thanks, >> > > > Victor >> > > > >> > > >> > >> -- Director of Data Science Cloudera Twitter: @josh_wills
