[ 
https://issues.apache.org/jira/browse/MAHOUT-1866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15305990#comment-15305990
 ] 

ASF GitHub Bot commented on MAHOUT-1866:
----------------------------------------

Github user andrewpalumbo commented on a diff in the pull request:

    https://github.com/apache/mahout/pull/237#discussion_r65007959
  
    --- Diff: 
math-scala/src/main/scala/org/apache/mahout/math/drm/package.scala ---
    @@ -148,6 +148,42 @@ package object drm {
       def drmSampleKRows[K](drmX: DrmLike[K], numSamples: Int, replacement: 
Boolean = false): Matrix =
         drmX.context.engine.drmSampleKRows(drmX, numSamples, replacement)
     
    +  /**
    +    * Convert a sampled DRM into a Tab Separated Vector (TSV) to be loaded 
into an R-DataFrame
    +    * for plotting and sketching
    +    * @param drmX - DRM
    +    * @param samplePercent - Percentage of Sample elements from the DRM to 
be fished out for plotting
    +    * @tparam K
    +    * @return TSV String
    +    */
    +  def sampleMatrixToTSV[K](drmX: DrmLike[K], samplePercent: Double = 1): 
String = {
    +
    --- End diff --
    
    Minor point: maybe rename to `drmSampleToTSV` or something along those 
lines so that it is obvious that its a DRM and not a Matrix?  Other than that 
+1 from me.


> Add matrix-to-tsv string function
> ---------------------------------
>
>                 Key: MAHOUT-1866
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-1866
>             Project: Mahout
>          Issue Type: Sub-task
>          Components: visiualization
>    Affects Versions: 0.12.1
>            Reporter: Trevor Grant
>            Assignee: Suneel Marthi
>             Fix For: 0.13.0
>
>
> Need a function to convert a matrix to a tsv string which can then be plotted 
> by
> - Zeppelin %table visualization packages
> - Passed to R / Python via Zeppelin Resource Manager
> It has been noted that a matrix can be registered as an RDD and passed across 
> contexts directly in Spark, however this breaks the 'backend agnoistic' 
> philosophy.  Until H20 and Flink also both support Python / R environments it 
> is more reasonable to use tab-seperated-value strings.
> Further, matrices might be extremely large and unfit for being directly 
> converted to tsvs.  It may be wise to introduce some sort of safety valve for 
> preventing excessively large matrices from being materialized into local 
> memory (eg. supposing the user hasn't called their own sampling method on a 
> matrix).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to