Re: Python SQL over Pandas Dataframe, was - Re: [GSoC 2016] Notebooks

Alexander Bezzubov Thu, 28 Jul 2016 23:56:42 -0700

Hi Paul,

it definitely looks like a bug and the right fix to me!


Could you please create a JIRA issue and submit a PR with the fix?
I think it is a very valuable contribution, thank you!

--
Alex

On Fri, Jul 29, 2016 at 3:45 PM, Paul Bustios Belizario <[email protected]>
wrote:

> Hi Alex,
>
> Now, with that problem solved, another one appeared. I tried to reproduce
> this example and got an error with the type of data:
> https://db.tt/fhfzlWGS
>
> I tested it with python 2 and 3, but the error persists. So I thought in
> casting the values to string when the content of the DataFrame is written
> in the StringIO stream:
>
> header_buf.write(str(df.columns[0]))
> header_buf.write(str(col))
> body_buf.write(str(row[0]))
> body_buf.write(str(cell))
>
> With such changes, the problem was solved: https://db.tt/afrCORIU
>
> I'm not sure how the error can be reproduced because I just tried to
> reproduce an example. Should it be consider a bug?
>
> Regards,
> Paul
>
> On Thu, Jul 28, 2016 at 6:52 AM Alexander Bezzubov <[email protected]> wrote:
>
> > Hi Paul,
> >
> > note the subject change.
> >
> > This is definitely a bug I can reproduce in default configuration!
> > Thank you for reporting, have logged it under ZEPPELIN-1244 [1] and
> > attached a hotfix.
> >
> >  1. https://issues.apache.org/jira/browse/ZEPPELIN-1244
> >
> > --
> > Alex
> >
> > On Thu, Jul 28, 2016 at 11:45 AM, Paul Bustios Belizario <
> > [email protected]
> > > wrote:
> >
> > > Hi Alexander,
> > >
> > > Yes, I'm using the latest version of the code in master branch and I
> have
> > > installed pandas and pandasql.
> > >
> > > By the way, I made searches in the repository. Below, the 2 screens of
> > the
> > > search results for:
> > >
> > > PythonPandasSQL*Interpreter*
> > >
> > >
> >
> https://dl.dropboxusercontent.com/u/20947972/search_pandassql_interpreter.png
> > >
> > > Python*Interpreter*PandasSQL
> > >
> > >
> >
> https://dl.dropboxusercontent.com/u/20947972/search_interpreter_pandassql.png
> > >
> > > Attached error log.
> > >
> > > Regards,
> > > Paul
> > >
> > > On Mon, Jul 25, 2016 at 10:44 PM Alexander Bezzubov <[email protected]>
> > > wrote:
> > >
> > >> Hi Paul,
> > >>
> > >> this sounds very strange indeed.
> > >>
> > >> Please make sure you are using latest master and to get correct
> > >> interpreters classnames - it should be enough to delete
> > >> /conf/interpreter-settings.json and restart Zeppelin - it will be
> > >> re-created.
> > >>
> > >> Regarding dependencies for to run %python.sql (it's implementation is
> > >> PythonInterpreterPandasSql), please refer [1] and make sure
> pre-requests
> > >> are installed on your system's Python (or the one that is configured
> to
> > >> use
> > >> through interpreter settings UI)
> > >>
> > >> If nothing helps, please feel free to file a Jira issue with
> description
> > >> on
> > >> how the error can be reproduced and I will be happy to help you and
> look
> > >> more into it!
> > >>
> > >> 1.
> > >>
> > >>
> >
> https://github.com/apache/zeppelin/blob/master/docs/interpreter/python.md#sql-over-pandas-dataframes
> > >>
> > >> --
> > >> Alex
> > >>
> > >> On Tue, Jul 26, 2016, 09:44 Paul Bustios Belizario <
> [email protected]>
> > >> wrote:
> > >>
> > >> > Hi Alexander,
> > >> >
> > >> > Yes. I knew that, but for some reason, that I'm still investigating,
> > >> > z.show() doesn't display the dataframe in my notebook. That's why I
> > >> decided
> > >> > to not incorporate z.show() yet. As soon as I find the problem I
> will
> > >> add
> > >> > it.
> > >> >
> > >> > Regarding to pandasql, there is an error creating the interpreter:
> > >> >
> > >> > java.lang.ClassNotFoundException:
> > >> > org.apache.zeppelin.python.PythonPandasSqlInterpreter
> > >> >
> > >> > I couldn't found PythonPandasSqlInterpreter class, I only found
> > >> > PythonInterpreterPandasSql.java in the latest version of the code.
> Is
> > >> there
> > >> > anything more I need to do to have ready this interpreter? Do I have
> > to
> > >> > change the class name of the interpreter in the
> > >> interpreter-setting.json?
> > >> >
> > >> > Regards,
> > >> > Paul
> > >> >
> > >> > On Sun, Jul 24, 2016 at 11:44 PM Alexander Bezzubov <[email protected]
> >
> > >> > wrote:
> > >> >
> > >> > > Thanks for sharing your progress Paul, the notebook looks great!
> > >> > >
> > >> > > By the way, did you know that in latest Apache Zeppelin instead of
> > >> > > ```
> > >> > > print(titanic.head())
> > >> > > ```
> > >> > > one can use
> > >> > >
> > >> > > ```
> > >> > > z.show(titanic)
> > >> > > ```
> > >> > > ?
> > >> > >
> > >> > > It would be a good opportunity to showcase this [1] and other
> > >> features of
> > >> > > the Python interpreter like recent SQL over PandasDataframe with
> > >> built-in
> > >> > > visualizations for easy exploratory analysis [2] thought this
> work,
> > >> how
> > >> > do
> > >> > > you think?
> > >> > >
> > >> > > 1.
> > >> > >
> > >> > >
> > >> >
> > >>
> >
> http://zeppelin.apache.org/docs/0.6.0/interpreter/python.html#pandas-integration
> > >> > > 2.
> > >> > >
> > >> > >
> > >> >
> > >>
> >
> https://github.com/apache/zeppelin/blob/master/docs/interpreter/python.md#sql-over-pandas-dataframes
> > >> > > --
> > >> > > Alex
> > >> > >
> > >> > > On Sat, Jul 23, 2016, 12:54 Paul Bustios Belizario <
> > >> [email protected]>
> > >> > > wrote:
> > >> > >
> > >> > > > Thanks Moon,
> > >> > > >
> > >> > > > Here is my third notebook using the Titanic dataset:
> > >> > > >
> > >> > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> >
> https://www.zeppelinhub.com/viewer/notebooks/bm90ZTovL2J1c3Rpb3MvbG9jYWwvYmI0Y2EwNjVkMTI1NDY2Y2EzNTIzNThiZjViYzIxOWQvbm90ZS5qc29u
> > >> > > >
> > >> > > > Now, I'm working on the fourth notebook and updating my first
> > >> notebook
> > >> > to
> > >> > > > use z.show()
> > >> > > >
> > >> > > > Regards,
> > >> > > > Paul
> > >> > > >
> > >> > > > On Sat, Jul 16, 2016 at 7:42 PM moon soo Lee <[email protected]>
> > >> wrote:
> > >> > > >
> > >> > > > > Hi Paul,
> > >> > > > >
> > >> > > > > That would be very interesting!
> > >> > > > > And like you mentioned, it's dataset that for starters. I
> think
> > >> it's
> > >> > > > super
> > >> > > > > reasonable to have a notebooks with those data.
> > >> > > > >
> > >> > > > > Thanks,
> > >> > > > > moon
> > >> > > > >
> > >> > > > > On Sat, Jul 9, 2016 at 11:09 AM Paul Bustios Belizario <
> > >> > > > [email protected]
> > >> > > > > >
> > >> > > > > wrote:
> > >> > > > >
> > >> > > > > > Hi community,
> > >> > > > > >
> > >> > > > > > I was searching some databases and chose [1,2] for the next
> > >> > > notebooks.
> > >> > > > > > These databases are not big, but are classic and educational
> > for
> > >> > > people
> > >> > > > > who
> > >> > > > > > are starting the path of data science. Additionally, through
> > the
> > >> > > > process
> > >> > > > > of
> > >> > > > > > machine learning, these databases can provide many graphics.
> > >> > > > > >
> > >> > > > > > What do you think?
> > >> > > > > >
> > >> > > > > > Regards,
> > >> > > > > > Paul
> > >> > > > > >
> > >> > > > > > [1] https://www.kaggle.com/c/titanic
> > >> > > > > > [2] https://www.kaggle.com/c/digit-recognizer
> > >> > > > > >
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> > >
> >
>

Re: Python SQL over Pandas Dataframe, was - Re: [GSoC 2016] Notebooks

Reply via email to