Hi All, I've finished my GSoC project but I have a problem. I've implemented a Spark backend for Gora and I've written a word count test class for it.
Here is my particular test method: https://github.com/kamaci/gora/blob/master/gora-hbase/src/test/java/org/apache/gora/hbase/mapreduce/TestHBaseStoreWordCount.java#L65 When I run my test there is no need to startup an Hbase cluster because Spark will connect to my dummy cluster. However when I run my test method it throws an error. Here is a part from stack trace: 2015-08-27 01:03:29,602 WARN [Executor task launch worker-0-SendThread(localhost:2181)] zookeeper.ClientCnxn (ClientCnxn.java:run(1089)) - Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739) at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:350) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1068) 2015-08-27 01:03:29,704 WARN [Executor task launch worker-0] zookeeper.RecoverableZooKeeper (RecoverableZooKeeper.java:retryOrThrow(276)) - Possibly transient ZooKeeper, quorum=localhost:2181, exception=org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase/meta-region-server 2015-08-27 01:03:29,704 ERROR [Executor task launch worker-0] zookeeper.RecoverableZooKeeper (RecoverableZooKeeper.java:retryOrThrow(278)) - ZooKeeper exists failed after 4 attempts 2015-08-27 01:03:29,704 WARN [Executor task launch worker-0] zookeeper.ZKUtil (ZKUtil.java:watchAndCheckExists(434)) - catalogtracker-on-hconnection-0x18cd6fd6, quorum=localhost:2181, baseZNode=/hbase Unable to set watcher on znode /hbase/meta-region-server org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase/meta-region-server at org.apache.zookeeper.KeeperException.create(KeeperException.java:99) at org.apache.zookeeper.KeeperException.create(KeeperException.java:51) at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1041) at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.exists(RecoverableZooKeeper.java:222) at org.apache.hadoop.hbase.zookeeper.ZKUtil.watchAndCheckExists(ZKUtil.java:425) at org.apache.hadoop.hbase.zookeeper.ZooKeeperNodeTracker.start(ZooKeeperNodeTracker.java:77) at org.apache.hadoop.hbase.catalog.CatalogTracker.start(CatalogTracker.java:199) at org.apache.hadoop.hbase.client.HBaseAdmin.startCatalogTracker(HBaseAdmin.java:261) at org.apache.hadoop.hbase.client.HBaseAdmin.getCatalogTracker(HBaseAdmin.java:234) at org.apache.hadoop.hbase.client.HBaseAdmin.tableExists(HBaseAdmin.java:305) at org.apache.hadoop.hbase.client.HBaseAdmin.tableExists(HBaseAdmin.java:321) at org.apache.gora.hbase.store.HBaseStore.schemaExists(HBaseStore.java:197) at org.apache.gora.hbase.store.HBaseStore.createSchema(HBaseStore.java:170) at org.apache.gora.hbase.store.HBaseStore.initialize(HBaseStore.java:147) at org.apache.gora.store.impl.DataStoreBase.readFields(DataStoreBase.java:213) at org.apache.gora.query.impl.QueryBase.readFields(QueryBase.java:215) at org.apache.gora.query.impl.PartitionQueryImpl.readFields(PartitionQueryImpl.java:151) at org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:71) at org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:42) at org.apache.gora.util.IOUtils.deserialize(IOUtils.java:228) at org.apache.gora.util.IOUtils.deserialize(IOUtils.java:248) at org.apache.gora.mapreduce.GoraInputSplit.readFields(GoraInputSplit.java:76) at org.apache.hadoop.io.ObjectWritable.readObject(ObjectWritable.java:285) at org.apache.hadoop.io.ObjectWritable.readFields(ObjectWritable.java:77) at org.apache.spark.SerializableWritable$$anonfun$readObject$1.apply$mcV$sp(SerializableWritable.scala:43) at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1138) at org.apache.spark.SerializableWritable.readObject(SerializableWritable.scala:39) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1017) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1893) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370) at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:68) at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:94) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:185) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) When I check the logs I see that cluster and Spark is started up correctly: 2015-08-27 01:02:29,067 INFO [main] hdfs.MiniDFSCluster (MiniDFSCluster.java:waitActive(2055)) - Cluster is active 2015-08-27 01:02:29,181 INFO [main] zookeeper.MiniZooKeeperCluster (MiniZooKeeperCluster.java:startup(200)) - Started MiniZK Cluster and connect 1 ZK server on client port: 63668 2015-08-27 01:02:45,233 INFO [main] util.Utils (Logging.scala:logInfo(59)) - Successfully started service 'sparkDriver' on port 60494. I realized that when I start up an Hbase from command line, my test method for Spark connects to it! So, it doesn't connect to dummy cluster but it tries to connect to default one. Any ideas about solving that connection problem? PS 1: I've ignored the test at my Github repository. PS 2: I don't think that there is a problem Spark side. PS 3: I'll upload full stack trace to https://issues.apache.org/jira/browse/GORA-386 Kind Regards, Furkan KAMACI

