Re: Can't download moderately large data or number of rows to csv

2017-05-03 Thread Paul Brenner
I’m not sure what the best solution is but I created a ticket here:

https://share.polymail.io/v1/z/b/NTkwYTM4NTgzMzIy/ptA3bo_BAIo9IWGz0OXooezKKqlB7FL6rPYuPfHCNnGvRz-yUxCoYMxiNmygRARAMgtzeZ4jz5UxoPQtQlYe-nLRtaBMkhFwn2t7rMLPwtJuDIDVDy0E_azvjPZDVrjRLGkL40kqM-qpxMg6BgBzUgcrawJMQ7dnfV93mVHjjMxqbM4r9K-k5eXP9dX4T5JgwSKXPpVopDZn19r-bP671LA_2MU4-_Vh
http://www.placeiq.com/ http://www.placeiq.com/ http://www.placeiq.com/

Paul Brenner

https://twitter.com/placeiq https://twitter.com/placeiq 
https://twitter.com/placeiq
https://www.facebook.com/PlaceIQ https://www.facebook.com/PlaceIQ
https://www.linkedin.com/company/placeiq 
https://www.linkedin.com/company/placeiq

DATA SCIENTIST

(217) 390-3033 

 

http://www.placeiq.com/2015/05/26/placeiq-named-winner-of-prestigious-2015-oracle-data-cloud-activate-award/
 
http://placeiq.com/2015/12/18/accuracy-vs-precision-in-location-data-mma-webinar/
 
http://placeiq.com/2015/12/18/accuracy-vs-precision-in-location-data-mma-webinar/
 
http://placeiq.com/2015/12/18/accuracy-vs-precision-in-location-data-mma-webinar/
 
http://placeiq.com/2015/12/18/accuracy-vs-precision-in-location-data-mma-webinar/
 
http://placeiq.com/2016/03/08/measuring-addressable-tv-campaigns-is-now-possible/
 
http://placeiq.com/2016/04/13/placeiq-joins-the-network-advertising-initiative-nai-as-100th-member/
 
http://placeiq.com/2016/04/13/placeiq-joins-the-network-advertising-initiative-nai-as-100th-member/
 
http://placeiq.com/2016/04/13/placeiq-joins-the-network-advertising-initiative-nai-as-100th-member/
 
http://placeiq.com/2016/04/13/placeiq-joins-the-network-advertising-initiative-nai-as-100th-member/
 
http://placeiq.com/2016/04/13/placeiq-joins-the-network-advertising-initiative-nai-as-100th-member/
 
http://pages.placeiq.com/Location-Data-Accuracy-Whitepaper-Download.html?utm_source=Signature_medium=Email_campaign=AccuracyWP
 
http://placeiq.com/2016/08/03/placeiq-bolsters-location-intelligence-platform-with-mastercard-insights/
 
http://placeiq.com/2016/10/26/the-making-of-a-location-data-industry-milestone/ 
http://placeiq.com/2016/12/07/placeiq-introduces-landmark-a-groundbreaking-offering-that-delivers-access-to-the-highest-quality-location-data-for-insights-that-fuel-limitless-business-decisions/

On Wed, May 03, 2017 at 4:01 AM Rick Moritz

<
mailto:Rick Moritz 
> wrote:



I think whether this is an issue or not, depends a lot on how you use Zeppelin, 
and what tools you need to integrate with. Sadly Excel is still around as a 
data processing tool, and many people who I introduce to Zeppelin are quite 
proficient with it, hence the desire to export to csv in a trivial manner --  
or merely the presence of the "download CSV"-button incites them to expect it 
to work for reasonably sized data (i.e. up to around 10^6 rows).

I do prefer Ruslan's idea, but I think Zeppelin should include something 
similar out of the box. The key requirement should be that the data doesn't 
have to travel through the notebook interface, but rather is made available in 
a temporary folder and then served via a download link. The downside to this 
approach is, that ideally you'd want this kind of operation to be interpreter 
agnostic. In that case every interpreter would need to offer an interface which 
allows to collect the data to a local-to-zeppelin temporary folder.

Nonetheless, to turn Zeppelin into the serve-it-all solution that it could be, 
I do believe that "fixing" the csv-export is important. I'd definitely vote for 
a Jira advancing this issue.

On Tue, May 2, 2017 at 9:33 PM, Kevin Niemann

<
mailto:kevin.niem...@gmail.com
>

wrote:

We came across this issue as well, Zeppelin csv export is using the data URI 
scheme which is base64 encoding all the rows into a single string, Chrome seems 
to crash with over a few thousand rows, but Firefox has been able to handle 
over 100k for me. However, the Zeppelin notebook itself becomes slow at that 
point. I would also like better support for the ability to export a large set 
of rows, perhaps another tool is more preferred?

On Tue, May 2, 2017 at 10:00 AM, Ruslan Dautkhanov

<
mailto:dautkha...@gmail.com
>

wrote:

Good idea to introduce in Zeppelin a way to download full datasets without 

actually visualizing them.

Not sure if this helps, we taught our users to use %sh hadoop fs -getmerge 
/hadoop/path/dir/ /some/nfs/mount/

for large files (they sometimes have to download datasets with millions of 
records).

They run Zeppelin on edge nodes that have NFS mounts to a drop zone.

ps. Hue has a limit too, by default 100k rows
https://github.com/cloudera/hue/blob/release-3.12.0/desktop/conf.dist/hue.ini#L905
 

Not sure how much it scales up.

--

Ruslan Dautkhanov

On Tue, May 2, 2017 at 10:41 AM, Paul Brenner

<
mailto:pbren...@placeiq.com
>

wrote:

There are limits to how much data the download to csv button will download 
(1.5MB? 3500 rows?) which limit zeppelin’s usefulness for our BI teams. This 
limit comes up far before 

Re: Can't download moderately large data or number of rows to csv

2017-05-03 Thread Rick Moritz
I think whether this is an issue or not, depends a lot on how you use
Zeppelin, and what tools you need to integrate with. Sadly Excel is still
around as a data processing tool, and many people who I introduce to
Zeppelin are quite proficient with it, hence the desire to export to csv in
a trivial manner --  or merely the presence of the "download CSV"-button
incites them to expect it to work for reasonably sized data (i.e. up to
around 10^6 rows).

I do prefer Ruslan's idea, but I think Zeppelin should include something
similar out of the box. The key requirement should be that the data doesn't
have to travel through the notebook interface, but rather is made available
in a temporary folder and then served via a download link. The downside to
this approach is, that ideally you'd want this kind of operation to be
interpreter agnostic. In that case every interpreter would need to offer an
interface which allows to collect the data to a local-to-zeppelin temporary
folder.

Nonetheless, to turn Zeppelin into the serve-it-all solution that it could
be, I do believe that "fixing" the csv-export is important. I'd definitely
vote for a Jira advancing this issue.

On Tue, May 2, 2017 at 9:33 PM, Kevin Niemann 
wrote:

> We came across this issue as well, Zeppelin csv export is using the data
> URI scheme which is base64 encoding all the rows into a single string,
> Chrome seems to crash with over a few thousand rows, but Firefox has been
> able to handle over 100k for me. However, the Zeppelin notebook itself
> becomes slow at that point. I would also like better support for the
> ability to export a large set of rows, perhaps another tool is more
> preferred?
>
> On Tue, May 2, 2017 at 10:00 AM, Ruslan Dautkhanov 
> wrote:
>
>> Good idea to introduce in Zeppelin a way to download full datasets
>> without
>> actually visualizing them.
>>
>> Not sure if this helps, we taught our users to use %sh hadoop fs
>> -getmerge /hadoop/path/dir/ /some/nfs/mount/
>> for large files (they sometimes have to download datasets with millions
>> of records).
>> They run Zeppelin on edge nodes that have NFS mounts to a drop zone.
>>
>> ps. Hue has a limit too, by default 100k rows
>> https://github.com/cloudera/hue/blob/release-3.12.0/desktop/
>> conf.dist/hue.ini#L905
>> Not sure how much it scales up.
>>
>>
>>
>> --
>> Ruslan Dautkhanov
>>
>> On Tue, May 2, 2017 at 10:41 AM, Paul Brenner 
>> wrote:
>>
>>> There are limits to how much data the download to csv button will
>>> download (1.5MB? 3500 rows?) which limit zeppelin’s usefulness for our BI
>>> teams. This limit comes up far before we run into issues with showing too
>>> many rows of data in zeppelin.
>>>
>>> Unfortunately (fortunately?) Hue is the other tool the BI team has been
>>> using and there they have no problem downloading much larger datasets to
>>> csv. This is definitely not a requirement I’ve ever run into in the way I
>>> use zeppelin since I would just use spark to write the data out. However,
>>> the BI team is not allowed to run spark jobs (they use hive via jdbc) so
>>> that download to csv button is pretty important to them.
>>>
>>> Would it be possible to significantly increase the limit? Even better
>>> would it be possible to download more data than is shown? I assume this is
>>> the type of thing I would need to open a ticket for, but I wanted to ask
>>> here first.
>>>
>>>  
>>>  Paul Brenner 
>>>  
>>>  
>>> 
>>> 
>>> DATA SCIENTIST
>>> *(217) 390-3033 <(217)%20390-3033> *
>>>
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 

Re: Can't download moderately large data or number of rows to csv

2017-05-02 Thread Kevin Niemann
We came across this issue as well, Zeppelin csv export is using the data
URI scheme which is base64 encoding all the rows into a single string,
Chrome seems to crash with over a few thousand rows, but Firefox has been
able to handle over 100k for me. However, the Zeppelin notebook itself
becomes slow at that point. I would also like better support for the
ability to export a large set of rows, perhaps another tool is more
preferred?

On Tue, May 2, 2017 at 10:00 AM, Ruslan Dautkhanov 
wrote:

> Good idea to introduce in Zeppelin a way to download full datasets without
> actually visualizing them.
>
> Not sure if this helps, we taught our users to use %sh hadoop fs -getmerge
> /hadoop/path/dir/ /some/nfs/mount/
> for large files (they sometimes have to download datasets with millions of
> records).
> They run Zeppelin on edge nodes that have NFS mounts to a drop zone.
>
> ps. Hue has a limit too, by default 100k rows
> https://github.com/cloudera/hue/blob/release-3.12.0/
> desktop/conf.dist/hue.ini#L905
> Not sure how much it scales up.
>
>
>
> --
> Ruslan Dautkhanov
>
> On Tue, May 2, 2017 at 10:41 AM, Paul Brenner 
> wrote:
>
>> There are limits to how much data the download to csv button will
>> download (1.5MB? 3500 rows?) which limit zeppelin’s usefulness for our BI
>> teams. This limit comes up far before we run into issues with showing too
>> many rows of data in zeppelin.
>>
>> Unfortunately (fortunately?) Hue is the other tool the BI team has been
>> using and there they have no problem downloading much larger datasets to
>> csv. This is definitely not a requirement I’ve ever run into in the way I
>> use zeppelin since I would just use spark to write the data out. However,
>> the BI team is not allowed to run spark jobs (they use hive via jdbc) so
>> that download to csv button is pretty important to them.
>>
>> Would it be possible to significantly increase the limit? Even better
>> would it be possible to download more data than is shown? I assume this is
>> the type of thing I would need to open a ticket for, but I wanted to ask
>> here first.
>>
>>  
>>  Paul Brenner 
>>  
>>  
>> 
>> 
>> DATA SCIENTIST
>> *(217) 390-3033 <(217)%20390-3033> *
>>
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> [image:
>> PlaceIQ:Location Data Accuracy]
>> 
>>
>
>


Re: Can't download moderately large data or number of rows to csv

2017-05-02 Thread Ruslan Dautkhanov
Good idea to introduce in Zeppelin a way to download full datasets without
actually visualizing them.

Not sure if this helps, we taught our users to use %sh hadoop fs -getmerge
/hadoop/path/dir/ /some/nfs/mount/
for large files (they sometimes have to download datasets with millions of
records).
They run Zeppelin on edge nodes that have NFS mounts to a drop zone.

ps. Hue has a limit too, by default 100k rows
https://github.com/cloudera/hue/blob/release-3.12.0/desktop/conf.dist/hue.ini#L905

Not sure how much it scales up.



-- 
Ruslan Dautkhanov

On Tue, May 2, 2017 at 10:41 AM, Paul Brenner  wrote:

> There are limits to how much data the download to csv button will download
> (1.5MB? 3500 rows?) which limit zeppelin’s usefulness for our BI teams.
> This limit comes up far before we run into issues with showing too many
> rows of data in zeppelin.
>
> Unfortunately (fortunately?) Hue is the other tool the BI team has been
> using and there they have no problem downloading much larger datasets to
> csv. This is definitely not a requirement I’ve ever run into in the way I
> use zeppelin since I would just use spark to write the data out. However,
> the BI team is not allowed to run spark jobs (they use hive via jdbc) so
> that download to csv button is pretty important to them.
>
> Would it be possible to significantly increase the limit? Even better
> would it be possible to download more data than is shown? I assume this is
> the type of thing I would need to open a ticket for, but I wanted to ask
> here first.
>
>  
>  Paul Brenner 
>  
>  
> 
> 
> DATA SCIENTIST
> *(217) 390-3033 <(217)%20390-3033> *
>
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> [image:
> PlaceIQ:Location Data Accuracy]
> 
>


Can't download moderately large data or number of rows to csv

2017-05-02 Thread Paul Brenner
There are limits to how much data the download to csv button will download 
(1.5MB? 3500 rows?) which limit zeppelin’s usefulness for our BI teams. This 
limit comes up far before we run into issues with showing too many rows of data 
in zeppelin.

Unfortunately (fortunately?) Hue is the other tool the BI team has been using 
and there they have no problem downloading much larger datasets to csv. This is 
definitely not a requirement I’ve ever run into in the way I use zeppelin since 
I would just use spark to write the data out. However, the BI team is not 
allowed to run spark jobs (they use hive via jdbc) so that download to csv 
button is pretty important to them. 

Would it be possible to significantly increase the limit? Even better would it 
be possible to download more data than is shown? I assume this is the type of 
thing I would need to open a ticket for, but I wanted to ask here first.

http://www.placeiq.com/ http://www.placeiq.com/ http://www.placeiq.com/

Paul Brenner

https://twitter.com/placeiq https://twitter.com/placeiq 
https://twitter.com/placeiq
https://www.facebook.com/PlaceIQ https://www.facebook.com/PlaceIQ
https://www.linkedin.com/company/placeiq 
https://www.linkedin.com/company/placeiq

DATA SCIENTIST

(217) 390-3033 

 

http://www.placeiq.com/2015/05/26/placeiq-named-winner-of-prestigious-2015-oracle-data-cloud-activate-award/
 
http://placeiq.com/2015/12/18/accuracy-vs-precision-in-location-data-mma-webinar/
 
http://placeiq.com/2015/12/18/accuracy-vs-precision-in-location-data-mma-webinar/
 
http://placeiq.com/2015/12/18/accuracy-vs-precision-in-location-data-mma-webinar/
 
http://placeiq.com/2015/12/18/accuracy-vs-precision-in-location-data-mma-webinar/
 
http://placeiq.com/2016/03/08/measuring-addressable-tv-campaigns-is-now-possible/
 
http://placeiq.com/2016/04/13/placeiq-joins-the-network-advertising-initiative-nai-as-100th-member/
 
http://placeiq.com/2016/04/13/placeiq-joins-the-network-advertising-initiative-nai-as-100th-member/
 
http://placeiq.com/2016/04/13/placeiq-joins-the-network-advertising-initiative-nai-as-100th-member/
 
http://placeiq.com/2016/04/13/placeiq-joins-the-network-advertising-initiative-nai-as-100th-member/
 
http://placeiq.com/2016/04/13/placeiq-joins-the-network-advertising-initiative-nai-as-100th-member/
 
http://pages.placeiq.com/Location-Data-Accuracy-Whitepaper-Download.html?utm_source=Signature_medium=Email_campaign=AccuracyWP
 
http://placeiq.com/2016/08/03/placeiq-bolsters-location-intelligence-platform-with-mastercard-insights/
 
http://placeiq.com/2016/10/26/the-making-of-a-location-data-industry-milestone/ 
http://placeiq.com/2016/12/07/placeiq-introduces-landmark-a-groundbreaking-offering-that-delivers-access-to-the-highest-quality-location-data-for-insights-that-fuel-limitless-business-decisions/