[GitHub] incubator-zeppelin pull request: Add Scala utility functions for d...

doanduyhai Wed, 03 Jun 2015 07:46:14 -0700

Github user doanduyhai commented on the pull request:

    https://github.com/apache/incubator-zeppelin/pull/80#issuecomment-108464392
  
    @Leemoonsoo 
    
    Thank you for  your interesting questions. Indeed I see 2 main points here:
    
    1. How to guarantee that the end-user do not fetch a massive amount of data 
from RDD to the Driver programe, e.g. add some mechanism to limit the amount of 
fetched data for display
    
    2. Normalize the `showRDD()` and `displayAsTable()` code to put them at the 
same place and not spread over
    
    
    So, my answers:
    
    1.Default limit for fetched data
    
     For this we can take the same approach as the one taken by `showRdd()`: 
     
    ```scala
      val defaultMaxLimit = 10000 
    
      def displayAsTable(maxLimit: Long, columnsLabel: String*) {...}
    
      def displayAsTable(columnsLabel: String*) {
          displayAsTable(defaultMaxLimit, columnsLabel)
      }
    ```
     The above solution works, but I propose another approach. In my last 
commit, the `displayAsTable()` method also works for plain **Scala collection** 
so I suggest to remove the support from RDD. If the end-user want to use 
`displayAsTable()`, **they must convert their RDD first to a Scala 
collection**. In some way, we force the end-user to call either `collect()` or 
`take(n)` **explicitly**.
    
    Example:
    
    ```scala
    rdd
    .map(...)
    .filter(...)
    .take(30) // User responsibility
    .displayAsTable(...)
    ```
    
     This way, we put the responsibility of collecting data explicitly on the 
end-user, no surprise, what do you think ?
     
    2. Normalize the `showRDD()` and `displayAsTable()` code 
    
    I had a look into the **[ZeppelinContext]** code and basically it is 
dealing with Scala code in Java. What we can do is:
    
     a. either put the `displayAsTable()` method inside this 
**`ZeppelinContext`** class and code everything in Java but then we loose the 
power of Scala implicit conversion. We'll need to do:
    
    ```
    val collection = rdd.map(...).filter(...).take(10)
    z.displayAsTable(collection,"Header1","Header2",...)
    ```
    
     b. or we port the code of **`ZeppelinContext`** in Scala directly so that 
both `displayAsTable()` and `showRDD()` are written directly in **Scala**. It 
would make sense because lots of the code in this class is dealing already with 
Scala code and it would avoid us converting back and forth between Java and 
Scala:
    
    I volunteer to port this class from Java to Scala (with unit tests of 
course). What do you think @Leemoonsoo  ?
    
    [ZeppelinContext]: 
https://github.com/apache/incubator-zeppelin/blob/master/spark/src/main/java/org/apache/zeppelin/spark/ZeppelinContext.java




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-zeppelin pull request: Add Scala utility functions for d...

Reply via email to