Github user doanduyhai commented on the pull request:
https://github.com/apache/incubator-zeppelin/pull/80#issuecomment-108464392
@Leemoonsoo
Thank you for your interesting questions. Indeed I see 2 main points here:
1. How to guarantee that the end-user do not fetch a massive amount of data
from RDD to the Driver programe, e.g. add some mechanism to limit the amount of
fetched data for display
2. Normalize the `showRDD()` and `displayAsTable()` code to put them at the
same place and not spread over
So, my answers:
1.Default limit for fetched data
For this we can take the same approach as the one taken by `showRdd()`:
```scala
val defaultMaxLimit = 10000
def displayAsTable(maxLimit: Long, columnsLabel: String*) {...}
def displayAsTable(columnsLabel: String*) {
displayAsTable(defaultMaxLimit, columnsLabel)
}
```
The above solution works, but I propose another approach. In my last
commit, the `displayAsTable()` method also works for plain **Scala collection**
so I suggest to remove the support from RDD. If the end-user want to use
`displayAsTable()`, **they must convert their RDD first to a Scala
collection**. In some way, we force the end-user to call either `collect()` or
`take(n)` **explicitly**.
Example:
```scala
rdd
.map(...)
.filter(...)
.take(30) // User responsibility
.displayAsTable(...)
```
This way, we put the responsibility of collecting data explicitly on the
end-user, no surprise, what do you think ?
2. Normalize the `showRDD()` and `displayAsTable()` code
I had a look into the **[ZeppelinContext]** code and basically it is
dealing with Scala code in Java. What we can do is:
a. either put the `displayAsTable()` method inside this
**`ZeppelinContext`** class and code everything in Java but then we loose the
power of Scala implicit conversion. We'll need to do:
```
val collection = rdd.map(...).filter(...).take(10)
z.displayAsTable(collection,"Header1","Header2",...)
```
b. or we port the code of **`ZeppelinContext`** in Scala directly so that
both `displayAsTable()` and `showRDD()` are written directly in **Scala**. It
would make sense because lots of the code in this class is dealing already with
Scala code and it would avoid us converting back and forth between Java and
Scala:
I volunteer to port this class from Java to Scala (with unit tests of
course). What do you think @Leemoonsoo ?
[ZeppelinContext]:
https://github.com/apache/incubator-zeppelin/blob/master/spark/src/main/java/org/apache/zeppelin/spark/ZeppelinContext.java
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---