[ 
https://issues.apache.org/jira/browse/MAHOUT-1583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14039798#comment-14039798
 ] 

ASF GitHub Bot commented on MAHOUT-1583:
----------------------------------------

Github user sscdotopen commented on a diff in the pull request:

    https://github.com/apache/mahout/pull/20#discussion_r14049574
  
    --- Diff: 
spark/src/main/scala/org/apache/mahout/sparkbindings/blas/CbindAB.scala ---
    @@ -0,0 +1,95 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *     http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.mahout.sparkbindings.blas
    +
    +import org.apache.log4j.Logger
    +import scala.reflect.ClassTag
    +import org.apache.mahout.sparkbindings.drm.DrmRddInput
    +import org.apache.mahout.math._
    +import scalabindings._
    +import RLikeOps._
    +import org.apache.mahout.math.drm.logical.OpCbind
    +import org.apache.spark.SparkContext._
    +
    +/** Physical cbind */
    +object CbindAB {
    +
    +  private val log = Logger.getLogger(CbindAB.getClass)
    +
    +  def cbindAB_nograph[K: ClassTag](op: OpCbind[K], srcA: DrmRddInput[K], 
srcB: DrmRddInput[K]): DrmRddInput[K] = {
    +
    +    val a = srcA.toDrmRdd()
    +    val b = srcB.toDrmRdd()
    +    val n = op.ncol
    +    val n1 = op.A.ncol
    +    val n2 = n - n1
    +
    +    // Check if A and B are identically partitioned AND keyed. if they 
are, then just perform zip
    +    // instead of join, and apply the op map-side. Otherwise, perform join 
and apply the op
    +    // reduce-side.
    +    val rdd = if (op.isIdenticallyPartitioned(op.A)) {
    +
    +      log.debug("applying zipped cbind()")
    +
    +      a
    +          .zip(b)
    +          .map {
    +        case ((keyA, vectorA), (keyB, vectorB)) =>
    +          assert(keyA == keyB, "inputs are claimed identically 
partitioned, but they are not identically keyed")
    +
    +          val dense = vectorA.isDense && vectorB.isDense
    +          val vec: Vector = if (dense) new DenseVector(n) else new 
SequentialAccessSparseVector(n)
    --- End diff --
    
    Wouldn't it be more performant to use a RandomAccessSparseVector for the 
assign and change it into a SequentialAcessSparseVector later?


> cbind() operator for Scala DRMs
> -------------------------------
>
>                 Key: MAHOUT-1583
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-1583
>             Project: Mahout
>          Issue Type: Task
>            Reporter: Dmitriy Lyubimov
>            Assignee: Dmitriy Lyubimov
>             Fix For: 1.0
>
>
> Another R-like operator, cbind (stitching two matrices together). Seems to 
> come up now and then. 
> Just like with elementwise operations, and, perhaps some other, it will have 
> two physical implementation paths, one is zip for identically distributed 
> operators, and another one is full join in case they are not.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to