Hi Ismael,
If 1. is probably the easiest way, I think that it would require some
changes at Zeppelin side anyway. AFAIK, Zeppelin directly leverages the
RDD and so it's tight to the Spark API.
So, maybe we will need to change a bit the Zeppelin backend to abstract
the current RDD usage to PCollection.
My $0.01
Regards
JB
On 05/17/2016 03:03 PM, Ismaël Mejía wrote:
Last week during the Apache Big Data / Apachecon conference i assisted to
some
presentations and one aspect that surprised me is how Apache Zeppelin was
used
by many presenters to show their data processing code (mostly in
python/scala).
I consider that even if this integration is not critical for Apache Beam, it
is important to support this, and i intend to collaborate in such task. I
just created an issue on JIRA for the people interested
https://issues.apache.org/jira/browse/BEAM-290
I briefly discussed with Alexander Bezzubov from Zeppelin about an initial
plan
to support Beam in three phases:
1. support the scala sdk (scio) + scala runners (spark):
This is first since most of the pieces exist already, we just need to put
the
things together.
2. integrate the java sdk
The big issue here is that there is not (yet) a decent java repl tool, and
the
support of such repl in zeppelin is an ongoing work.
3. integrate the python sdk
This one depends on the release of the python sdk in the upcoming weeks,
and its
priority can change if integration is easier than the other two tasks.
Of course this message is a call to other interested parties to contribute,
e.g.
ideas, agenda to prioritize certain runners, or other complementary tasks
to
achieve the goals like integrate scio, support the google storage backend
for the
notebooks (to make a nicer integration for users of the runner in the google
cloud), etc.
Ismaël Mejía
--
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com