org.apache.spark.sql.sources.DDLException: Unsupported dataType: [1.1] failure: ``varchar'' expected but identifier char found in spark-sql
Hi, I am not sure this has been reported already or not, I run into this error under spark-sql shell as build from newest of spark git trunk, spark-sql describe qiuzhuang_hcatlog_import; 15/02/17 14:38:36 ERROR SparkSQLDriver: Failed in [describe qiuzhuang_hcatlog_import] org.apache.spark.sql.sources.DDLException: Unsupported dataType: [1.1] failure: ``varchar'' expected but identifier char found char(32) ^ at org.apache.spark.sql.sources.DDLParser.parseType(ddl.scala:52) at org.apache.spark.sql.hive.MetastoreRelation$SchemaAttribute.toAttribute(HiveMetastoreCatalog.scala:664) at org.apache.spark.sql.hive.MetastoreRelation$$anonfun$23.apply(HiveMetastoreCatalog.scala:674) at org.apache.spark.sql.hive.MetastoreRelation$$anonfun$23.apply(HiveMetastoreCatalog.scala:674) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) at scala.collection.Iterator$class.foreach(Iterator.scala:727) at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) at scala.collection.IterableLike$class.foreach(IterableLike.scala:72) at scala.collection.AbstractIterable.foreach(Iterable.scala:54) at scala.collection.TraversableLike$class.map(TraversableLike.scala:244) at scala.collection.AbstractTraversable.map(Traversable.scala:105) at org.apache.spark.sql.hive.MetastoreRelation.init(HiveMetastoreCatalog.scala:674) at org.apache.spark.sql.hive.HiveMetastoreCatalog.lookupRelation(HiveMetastoreCatalog.scala:185) at org.apache.spark.sql.hive.HiveContext$$anon$2.org $apache$spark$sql$catalyst$analysis$OverrideCatalog$$super$lookupRelation(HiveContext.scala:234) As in hive 0.131, console, this commands works, hive describe qiuzhuang_hcatlog_import; OK id char(32) assistant_novarchar(20) assistant_name varchar(32) assistant_type int grade int shop_no varchar(20) shop_name varchar(64) organ_novarchar(20) organ_name varchar(20) entry_date string education int commission decimal(8,2) tel varchar(20) address varchar(100) identity_card varchar(25) sex int birthdaystring employee_type int status int remark varchar(255) create_user_no varchar(20) create_user varchar(32) create_time string update_user_no varchar(20) update_user varchar(32) update_time string Time taken: 0.49 seconds, Fetched: 26 row(s) hive Regards, Qiuzhuang
Too many open files error
Hi All, While doing some ETL, I run into error of 'Too many open files' as following logs, Thanks, Qiuzhuang 4/11/20 20:12:02 INFO collection.ExternalAppendOnlyMap: Thread 63 spilling in-memory map of 100.8 KB to disk (953 times so far) 14/11/20 20:12:02 ERROR storage.DiskBlockObjectWriter: Uncaught exception while reverting partial writes to file /tmp/spark-local-20141120200455-4137/2f/temp_local_f83cbf2f-60a4-4fbd-b5d2-32a0c569311b java.io.FileNotFoundException: /tmp/spark-local-20141120200455-4137/2f/temp_local_f83cbf2f-60a4-4fbd-b5d2-32a0c569311b (Too many open files) at java.io.FileOutputStream.open(Native Method) at java.io.FileOutputStream.init(FileOutputStream.java:221) at org.apache.spark.storage.DiskBlockObjectWriter.revertPartialWritesAndClose(BlockObjectWriter.scala:178) at org.apache.spark.util.collection.ExternalAppendOnlyMap.spill(ExternalAppendOnlyMap.scala:203) at org.apache.spark.util.collection.ExternalAppendOnlyMap.spill(ExternalAppendOnlyMap.scala:63) at org.apache.spark.util.collection.Spillable$class.maybeSpill(Spillable.scala:77) at org.apache.spark.util.collection.ExternalAppendOnlyMap.maybeSpill(ExternalAppendOnlyMap.scala:63) at org.apache.spark.util.collection.ExternalAppendOnlyMap.insertAll(ExternalAppendOnlyMap.scala:131) at org.apache.spark.rdd.CoGroupedRDD$$anonfun$compute$5.apply(CoGroupedRDD.scala:160) at org.apache.spark.rdd.CoGroupedRDD$$anonfun$compute$5.apply(CoGroupedRDD.scala:159) at scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:771) at org.apache.spark.rdd.CoGroupedRDD.compute(CoGroupedRDD.scala:159) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263) at org.apache.spark.rdd.RDD.iterator(RDD.scala:230) at org.apache.spark.rdd.MappedValuesRDD.compute(MappedValuesRDD.scala:31) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263) at org.apache.spark.rdd.RDD.iterator(RDD.scala:230) at org.apache.spark.rdd.FlatMappedValuesRDD.compute(FlatMappedValuesRDD.scala:31) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263) at org.apache.spark.rdd.RDD.iterator(RDD.scala:230) at org.apache.spark.rdd.FilteredRDD.compute(FilteredRDD.scala:34) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263) at org.apache.spark.rdd.RDD.iterator(RDD.scala:230) at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263) at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:61) at org.apache.spark.rdd.RDD.iterator(RDD.scala:228) at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263) at org.apache.spark.rdd.RDD.iterator(RDD.scala:230) at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263) at org.apache.spark.rdd.RDD.iterator(RDD.scala:230) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61) at org.apache.spark.scheduler.Task.run(Task.scala:56) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:196) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) 14/11/20 20:12:02 ERROR executor.Executor: Exception in task 0.0 in stage 36.0 (TID 20) java.io.FileNotFoundException: /tmp/spark-local-20141120200455-4137/2f/temp_local_f83cbf2f-60a4-4fbd-b5d2-32a0c569311b (Too many open files) at java.io.FileOutputStream.open(Native Method) at java.io.FileOutputStream.init(FileOutputStream.java:221) at org.apache.spark.storage.DiskBlockObjectWriter.open(BlockObjectWriter.scala:123) at org.apache.spark.storage.DiskBlockObjectWriter.write(BlockObjectWriter.scala:192) at org.apache.spark.util.collection.ExternalAppendOnlyMap.spill(ExternalAppendOnlyMap.scala:180) at org.apache.spark.util.collection.ExternalAppendOnlyMap.spill(ExternalAppendOnlyMap.scala:63) at org.apache.spark.util.collection.Spillable$class.maybeSpill(Spillable.scala:77) at org.apache.spark.util.collection.ExternalAppendOnlyMap.maybeSpill(ExternalAppendOnlyMap.scala:63) at org.apache.spark.util.collection.ExternalAppendOnlyMap.insertAll(ExternalAppendOnlyMap.scala:131)
src/main/resources/kv1.txt not found in example of HiveFromSpark
When running HiveFromSpark example via run-example shell, I got error, FAILED: SemanticException Line 1:23 Invalid path ''src/main/resources/kv1.txt'': No files matching path file:/home/kand/javaprojects/spark/src/main/resources/kv1.txt == END HIVE FAILURE OUTPUT == Exception in thread main org.apache.spark.sql.execution.QueryExecutionException: FAILED: SemanticException Line 1:23 Invalid path ''src/main/resources/kv1.txt'': No files matching path file:/home/kand/javaprojects/spark/src/main/resources/kv1.txt I think the hardcode dir should be examples/src/main/resources/kv1.txt instead of src/main/resources/kv1.txt in that class. Thanks, Qiuzhuang
serialVersionUID incompatible error in class BlockManagerId
Hi, I update git today and when connecting to spark cluster, I got the serialVersionUID incompatible error in class BlockManagerId. Here is the log, Shouldn't we better give BlockManagerId a constant serialVersionUID avoid this? Thanks, Qiuzhuang scala val rdd = sc.parparallelize(1 to 100014/10/25 09:10:48 ERROR Remoting: org.apache.spark.storage.BlockManagerId; local class incompatible: stream classdesc serialVersionUID = 2439208141545036836, local class serialVersionUID = 4657685702603429489 java.io.InvalidClassException: org.apache.spark.storage.BlockManagerId; local class incompatible: stream classdesc serialVersionUID = 2439208141545036836, local class serialVersionUID = 4657685702603429489 at java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:617) at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1622) at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1517) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1771) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370) at akka.serialization.JavaSerializer$$anonfun$1.apply(Serializer.scala:136) at scala.util.DynamicVariable.withValue(DynamicVariable.scala:57) at akka.serialization.JavaSerializer.fromBinary(Serializer.scala:136) at akka.serialization.Serialization$$anonfun$deserialize$1.apply(Serialization.scala:104) at scala.util.Try$.apply(Try.scala:161) at akka.serialization.Serialization.deserialize(Serialization.scala:98) at akka.remote.MessageSerializer$.deserialize(MessageSerializer.scala:23) at akka.remote.DefaultMessageDispatcher.payload$lzycompute$1(Endpoint.scala:58) at akka.remote.DefaultMessageDispatcher.payload$1(Endpoint.scala:58) at akka.remote.DefaultMessageDispatcher.dispatch(Endpoint.scala:76) at akka.remote.EndpointReader$$anonfun$receive$2.applyOrElse(Endpoint.scala:937) at akka.actor.Actor$class.aroundReceive(Actor.scala:465) at akka.remote.EndpointActor.aroundReceive(Endpoint.scala:415) at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516) at akka.actor.ActorCell.invoke(ActorCell.scala:487) at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:238) at akka.dispatch.Mailbox.run(Mailbox.scala:220) at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:393) at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) 14/10/25 09:10:48 ERROR SparkDeploySchedulerBackend: Asked to remove non existant executor 1 0014/10/25 09:11:21 ERROR Remoting: org.apache.spark.storage.BlockManagerId; local class incompatible: stream classdesc serialVersionUID = 2439208141545036836, local class serialVersionUID = 4657685702603429489 java.io.InvalidClassException: org.apache.spark.storage.BlockManagerId; local class incompatible: stream classdesc serialVersionUID = 2439208141545036836, local class serialVersionUID = 4657685702603429489 at java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:617) at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1622) at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1517) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1771) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370) at akka.serialization.JavaSerializer$$anonfun$1.apply(Serializer.scala:136) at scala.util.DynamicVariable.withValue(DynamicVariable.scala:57) at akka.serialization.JavaSerializer.fromBinary(Serializer.scala:136) at akka.serialization.Serialization$$anonfun$deserialize$1.apply(Serialization.scala:104) at scala.util.Try$.apply(Try.scala:161) at
Re: serialVersionUID incompatible error in class BlockManagerId
I update git trunk and build in the two linux machines. I think they should have the same version. I am going to do a force clean build and then retry. Thanks. On Sat, Oct 25, 2014 at 9:23 AM, Josh Rosen rosenvi...@gmail.com wrote: Are all processes (Master, Worker, Executors, Driver) running the same Spark build? This error implies that you’re seeing protocol / binary incompatibilities between your Spark driver and cluster. Spark is API-compatibile across the 1.x series, but we don’t make binary link-level compatibility guarantees: https://cwiki.apache.org/confluence/display/SPARK/Spark+Versioning+Policy. This means that your Spark driver’s runtime classpath should use the same version of Spark that’s installed on your cluster. You can *compile* against a different API-compatible version of Spark, but the runtime versions must match across all components. To fix this issue, I’d check that you’ve run the “package” and “assembly” phases and that your Spark cluster is using this updated version. - Josh On October 24, 2014 at 6:17:26 PM, Qiuzhuang Lian ( qiuzhuang.l...@gmail.com) wrote: Hi, I update git today and when connecting to spark cluster, I got the serialVersionUID incompatible error in class BlockManagerId. Here is the log, Shouldn't we better give BlockManagerId a constant serialVersionUID avoid this? Thanks, Qiuzhuang scala val rdd = sc.parparallelize(1 to 100014/10/25 09:10:48 ERROR Remoting: org.apache.spark.storage.BlockManagerId; local class incompatible: stream classdesc serialVersionUID = 2439208141545036836, local class serialVersionUID = 4657685702603429489 java.io.InvalidClassException: org.apache.spark.storage.BlockManagerId; local class incompatible: stream classdesc serialVersionUID = 2439208141545036836, local class serialVersionUID = 4657685702603429489 at java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:617) at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1622) at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1517) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1771) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370) at akka.serialization.JavaSerializer$$anonfun$1.apply(Serializer.scala:136) at scala.util.DynamicVariable.withValue(DynamicVariable.scala:57) at akka.serialization.JavaSerializer.fromBinary(Serializer.scala:136) at akka.serialization.Serialization$$anonfun$deserialize$1.apply(Serialization.scala:104) at scala.util.Try$.apply(Try.scala:161) at akka.serialization.Serialization.deserialize(Serialization.scala:98) at akka.remote.MessageSerializer$.deserialize(MessageSerializer.scala:23) at akka.remote.DefaultMessageDispatcher.payload$lzycompute$1(Endpoint.scala:58) at akka.remote.DefaultMessageDispatcher.payload$1(Endpoint.scala:58) at akka.remote.DefaultMessageDispatcher.dispatch(Endpoint.scala:76) at akka.remote.EndpointReader$$anonfun$receive$2.applyOrElse(Endpoint.scala:937) at akka.actor.Actor$class.aroundReceive(Actor.scala:465) at akka.remote.EndpointActor.aroundReceive(Endpoint.scala:415) at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516) at akka.actor.ActorCell.invoke(ActorCell.scala:487) at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:238) at akka.dispatch.Mailbox.run(Mailbox.scala:220) at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:393) at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) 14/10/25 09:10:48 ERROR SparkDeploySchedulerBackend: Asked to remove non existant executor 1 0014/10/25 09:11:21 ERROR Remoting: org.apache.spark.storage.BlockManagerId; local class incompatible: stream classdesc serialVersionUID = 2439208141545036836, local class serialVersionUID = 4657685702603429489 java.io.InvalidClassException: org.apache.spark.storage.BlockManagerId; local class incompatible: stream classdesc serialVersionUID = 2439208141545036836, local class serialVersionUID = 4657685702603429489 at java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:617) at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1622) at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1517) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1771
Re: Run ScalaTest inside Intellij IDEA
) at scala.reflect.internal.Symbols$ClassSymbol.companionModule(Symbols.scala:2991) at scala.tools.nsc.backend.jvm.GenASM$JPlainBuilder.genClass(GenASM.scala:1371) at scala.tools.nsc.backend.jvm.GenASM$AsmPhase.run(GenASM.scala:120) at scala.tools.nsc.Global$Run.compileUnitsInternal(Global.scala:1583) at scala.tools.nsc.Global$Run.compileUnits(Global.scala:1557) at scala.tools.nsc.Global$Run.compileSources(Global.scala:1553) at scala.tools.nsc.Global$Run.compile(Global.scala:1662) at xsbt.CachedCompiler0.run(CompilerInterface.scala:126) at xsbt.CachedCompiler0.run(CompilerInterface.scala:102) at xsbt.CompilerInterface.run(CompilerInterface.scala:27) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at sbt.compiler.AnalyzingCompiler.call(AnalyzingCompiler.scala:102) at sbt.compiler.AnalyzingCompiler.compile(AnalyzingCompiler.scala:48) at sbt.compiler.AnalyzingCompiler.compile(AnalyzingCompiler.scala:41) at org.jetbrains.jps.incremental.scala.local.IdeaIncrementalCompiler.compile(IdeaIncrementalCompiler.scala:28) at org.jetbrains.jps.incremental.scala.local.LocalServer.compile(LocalServer.scala:25) at org.jetbrains.jps.incremental.scala.remote.Main$.make(Main.scala:64) at org.jetbrains.jps.incremental.scala.remote.Main$.nailMain(Main.scala:22) at org.jetbrains.jps.incremental.scala.remote.Main.nailMain(Main.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at com.martiansoftware.nailgun.NGSession.run(NGSession.java:319) On Jun 11, 2014, at 11:17 AM, Qiuzhuang Lian qiuzhuang.l...@gmail.com wrote: I also run into this problem when running examples in IDEA. The issue looks that it uses depends on too many jars and that the classpath seems to have length limit. So I import the assembly jar and put the head of the list dependent path and it works. Thanks, Qiuzhuang On Wed, Jun 11, 2014 at 10:39 AM, 申毅杰 henry.yijies...@gmail.com wrote: Hi All, I want to run ScalaTest Suite in IDEA directly, but it seems didn’t pass the make phase before test running. The problems are as follows: /Users/yijie/code/apache.spark.master/core/src/main/scala/org/apache/spark/executor/MesosExecutorBackend.scala Error:(44, 35) type mismatch; found : org.apache.mesos.protobuf.ByteString required: com.google.protobuf.ByteString .setData(ByteString.copyFrom(data)) ^ /Users/yijie/code/apache.spark.master/core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosSchedulerBackend.scala Error:(119, 35) type mismatch; found : org.apache.mesos.protobuf.ByteString required: com.google.protobuf.ByteString .setData(ByteString.copyFrom(createExecArg())) ^ Error:(257, 35) type mismatch; found : org.apache.mesos.protobuf.ByteString required: com.google.protobuf.ByteString .setData(ByteString.copyFrom(task.serializedTask)) ^ Before I run test in IDEA, I build spark through ’sbt/sbt assembly’, import projects into IDEA after ’sbt/sbt gen-idea’, and able to run test in Terminal ’sbt/sbt test’ Are there anything I leave out in order to run/debug testsuite inside IDEA? Best regards, Yijie
How to add user local repository defined in localRepository in settings.xml into Spark SBT build
Hi, I customized MVN_HOME/conf/settings.xml's localRepository tag To manage maven local jars. localRepositoryF:/Java/maven-build/.m2/repository/localRepository However when I build Spark with SBT, it seems that it still gets the default .m2 repository under Path.userHome + /.m2/repository How should I let SBT pick up my customized localRepository instead? Thanks, Qiuzhuang