Kousuke Saruta has posted comments on this change.

Change subject: KuduRDD.collect fails because of NoSerializableException
......................................................................


Patch Set 4:

> > Can we have any compatibility issue? Do you have any examples?
 > 
 > Yes, by implementing io.Serializable a class representation is
 > constrained to the fields in the original format.  See Effective
 > Java item 74 for more reasons why the Serializable API should be
 > avoided.
 > 
 > >  Kryo can't serialize/deserialize guava's ImmutableList
 > 
 > Kryo support for guava collections is provided by 
 > https://github.com/magro/kryo-serializers,
 > if you add that jar to the classpath I believe that error should go
 > away.
 > 
 > <dependency>
 > <groupId>de.javakaffee</groupId>
 > <artifactId>kryo-serializers</artifactId>
 > <version>0.41</version>
 > </dependency>

Thanks for the reply.

 > Yes, by implementing io.Serializable a class representation is
 > constrained to the fields in the original format.  See Effective
 > Java item 74 for more reasons why the Serializable API should be
 > avoided.

Yeah, I know the generalization but what I want to know is whether the 
constraint really affects Kudu with Spark.

 > Kryo support for guava collections is provided by 
 > https://github.com/magro/kryo-serializers,
 > if you add that jar to the classpath I believe that error should go
 > away.
 > 
 > <dependency>
 > <groupId>de.javakaffee</groupId>
 > <artifactId>kryo-serializers</artifactId>
 > <version>0.41</version>
 > </dependency>

Thanks for the information. Actually I know the library but I think it's not 
good solution because of following two reasons.

First, to use the library, we need to register the serializer for ImmutableList 
like "ImmutableListSerializer.register(kryo)"
and this method require a Kryo instance but Spark doesn't expose the instance 
to users. So we can't leverage the library.
Second, why users need to know concrete class representation (it's 
columnIndexByName in this case ). I don't think it's good
design to enforce users to know the concrete representation.


I have another option which doesn't need to make the classes Serializable but 
need to modify KuduRDD and KuduRow a little like as follows.

```
/**
  * A Spark SQL [[Row]] iterator which wraps a [[KuduScanner]].
  * @param scanner the wrapped scanner
  */
private[spark] class RowResultIteratorScala(private val scanner: KuduScanner) 
extends Iterator[Row] {

  <skip>

  override def next(): Row = {
    val rowResult = currentIterator.next()
    val length = rowResult.getColumnProjection.getColumnCount
    val array = new Array[Any](length)
    var i = 0

    while (i < length) {
      array(i) = 
        if (rowResult.isNull(i)) null
        else rowResult.getColumnType(i) match {
          case Type.BOOL => rowResult.getBoolean(i)
          case Type.INT8 => rowResult.getByte(i)
          case Type.INT16 => rowResult.getShort(i)
          case Type.INT32 => rowResult.getInt(i)
          case Type.INT64 => rowResult.getLong(i)
          case Type.UNIXTIME_MICROS => 
KuduRelation.microsToTimestamp(rowResult.getLong(i))
          case Type.FLOAT => rowResult.getFloat(i)
          case Type.DOUBLE => rowResult.getDouble(i)
          case Type.STRING => rowResult.getString(i)
          case Type.BINARY => rowResult.getBinaryCopy(i)
        }
        i += 1
    }
    new KuduRow(array)
  }
}

/**
  * A Spark SQL [[Row]] which wraps a Kudu [[RowResult]].
  * @param values the row result represented as array.
  */
private[spark] class KuduRow(private val values: Array[Any]) extends Row {
  override def length: Int = values.length

  override def get(i: Int): Any = values(i)

  override def copy(): Row = Row.fromSeq(Range(0, length).map(get))
}
```

How about this option?

-- 
To view, visit http://gerrit.cloudera.org:8080/5496
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: If0463424481a33c66fd7464345c305062420cfe9
Gerrit-PatchSet: 4
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Kousuke Saruta <saru...@oss.nttdata.co.jp>
Gerrit-Reviewer: Dan Burkert <danburk...@apache.org>
Gerrit-Reviewer: Kousuke Saruta <saru...@oss.nttdata.co.jp>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-HasComments: No

Reply via email to