type mismatch
val schemas = createSchemas(config) val arr = new Array[String](schemas.size()) lines.map(x => { val obj = JSON.parseObject(x) val vs = new Array[Any](schemas.size()) for (i <- 0 until schemas.size()) { arr(i) = schemas.get(i).name vs(i) = x.getString(schemas.get(i).name) } } val seq = Seq(vs: _*) val record = Row.fromSeq(seq) record })(Encoders.javaSerialization(Row.getClass)) .toDF(arr: _*) I get a error type mismatch; found : Class[?0] where type ?0 <: org.apache.spark.sql.Row.type required: Class[org.apache.spark.sql.Row] })(Encoders.javaSerialization(Row.getClass)) igyu
Spark FlatMapGroupsWithStateFunction throws cannot resolve 'named_struct()' due to data type mismatch 'SerializeFromObject"
Hello, Am using FlatMapGroupsWithStateFunction in my spark streaming application. FlatMapGroupsWithStateFunction idstateUpdateFunction = new FlatMapGroupsWithStateFunction() {.} SessionUpdate class is having trouble when added the highlighted code which throws below exception; The same attribute milestones with setter/getter has been added to SessionInfo (input class) but it does not throw exception there. public static class SessionUpdate implements Serializable { private static final long serialVersionUID = -3858977319192658483L; *private ArrayList milestones = new ArrayList();* private Timestamp processingTimeoutTimestamp; public SessionUpdate() { super(); } public SessionUpdate(String instanceId, *ArrayList milestones*, Timestamp processingTimeoutTimestamp) { super(); this.instanceId = instanceId; *this.milestones = milestones;* this.processingTimeoutTimestamp = processingTimeoutTimestamp; } public String getInstanceId() { return instanceId; } public void setInstanceId(String instanceId) { this.instanceId = instanceId; } *public ArrayList getMilestones() {* * return milestones;* *}* *public void setMilestones(ArrayList milestones) {* * this.milestones = milestones;* *}* public Timestamp getProcessingTimeoutTimestamp() { return processingTimeoutTimestamp; } public void setProcessingTimeoutTimestamp(Timestamp processingTimeoutTimestamp) { this.processingTimeoutTimestamp = processingTimeoutTimestamp; } } Exception: ERROR cannot resolve 'named_struct()' due to data type mismatch: input to function named_struct requires at least one argument;; 'SerializeFromObject [staticinvoke(class org.apache.spark.unsafe.types.UTF8String, StringType, fromString, assertnotnull(input[0, oracle.insight.spark.event_processor.EventProcessor$SessionUpdate, true]).getInstanceId, true, false) AS instanceId#62, mapobjects(MapObjects_loopValue2, MapObjects_loopIsNull2, ObjectType(class org.apache.spark.sql.catalyst.expressions.GenericRowWithSchema), if (isnull(lambdavariable(MapObjects_loopValue2, MapObjects_loopIsNull2, ObjectType(class org.apache.spark.sql.catalyst.expressions.GenericRowWithSchema), true))) null else named_struct(), assertnotnull(input[0, oracle.insight.spark.event_processor.EventProcessor$SessionUpdate, true]).getMilestones, None) AS milestones#63, staticinvoke(class org.apache.spark.sql.catalyst.util.DateTimeUtils$, TimestampType, fromJavaTimestamp, assertnotnull(input[0, oracle.insight.spark.event_processor.EventProcessor$SessionUpdate, true]).getProcessingTimeoutTimestamp, true, false) AS processingTimeoutTimestamp#64] +- FlatMapGroupsWithState , cast(value#54 as string).toString, createexternalrow(EventTime#23.toString, InstanceID#24.toString, Model#25.toString, Milestone#26.toString, Region#27.toString, SalesOrganization#28.toString, ProductName#29.toString, ReasonForQuoteReject#30.toString, ReasonforRejectionBy#31.toString, OpportunityAmount#32.toJavaBigDecimal, Discount#33.toJavaBigDecimal, TotalQuoteAmount#34.toJavaBigDecimal, NetQuoteAmount#35.toJavaBigDecimal, ApprovedDiscount#36.toJavaBigDecimal, TotalOrderAmount#37.toJavaBigDecimal, StructField(EventTime,StringType,true), StructField(InstanceID,StringType,true), StructField(Model,StringType,true), StructField(Milestone,StringType,true), StructField(Region,StringType,true), StructField(SalesOrganization,StringType,true), StructField(ProductName,StringType,true), StructField(ReasonForQuoteReject,StringType,true), StructField(ReasonforRejectionBy,StringType,true), ... 6 more fields), [value#54], [EventTime#23, InstanceID#24, Model#25, Milestone#26, Region#27, SalesOrganization#28, ProductName#29, ReasonForQuoteReject#30, ReasonforRejectionBy#31, OpportunityAmount#32, Discount#33, TotalQuoteAmount#34, NetQuoteAmount#35, ApprovedDiscount#36, TotalOrderAmount#37], obj#61: oracle.insight.spark.event_processor.EventProcessor$SessionUpdate, class[instanceId[0]: string, milestones[0]: array>, processingTimeoutTimestamp[0]: timestamp], Append, false, ProcessingTimeTimeout Schema looks like {"Name":"EventTime","DataType":"TimestampType"}, {"Name":"InstanceID", "DataType":"STRING", "Length":100}, {"Name":"Model","DataType":"STRING", "Length":100}, {"Name":"Milestone","DataType":"STRING", "Length":100}, {"Name":"Region", "DataType":"STRING", "Length":100}, {"Name":"SalesOrganization","DataType":"STRING", "Length":100}, {"Name":"ProductName", "DataType":"STRING", "Length":100}, {"Name":"ReasonForQuoteReject", "DataType":"STRING", "Length":100}, {
Re: Error :Type mismatch error when passing hdfs file path to spark-csv load method
On the line preceding the one that the compiler is complaining about (which doesn't actually have a problem in itself), you declare df as "df"+fileName, making it a string. Then you try to assign a DataFrame to df, but it's already a string. I don't quite understand your intent with that previous line, but I'm guessing you didn't mean to assign a string to df. ~ Jonathan On Sun, Feb 21, 2016 at 8:45 PM Divya Gehlot <divya.htco...@gmail.com> wrote: > Hi, > I am trying to dynamically create Dataframe by reading subdirectories > under parent directory > > My code looks like > >> import org.apache.spark._ >> import org.apache.spark.sql._ >> val hadoopConf = new org.apache.hadoop.conf.Configuration() >> val hdfsConn = org.apache.hadoop.fs.FileSystem.get(new >> java.net.URI("hdfs://xxx.xx.xx.xxx:8020"), hadoopConf) >> hdfsConn.listStatus(new >> org.apache.hadoop.fs.Path("/TestDivya/Spark/ParentDir/")).foreach{ >> fileStatus => >>val filePathName = fileStatus.getPath().toString() >>val fileName = fileStatus.getPath().getName().toLowerCase() >>var df = "df"+fileName >>df = >> sqlContext.read.format("com.databricks.spark.csv").option("header", >> "true").option("inferSchema", "true").load(filePathName) >> } > > > getting below error > >> :35: error: type mismatch; >> found : org.apache.spark.sql.DataFrame >> required: String >> df = >> sqlContext.read.format("com.databricks.spark.csv").option("header", >> "true").option("inferSchema", "true").load(filePathName) > > > Am I missing something ? > > Would really appreciate the help . > > > Thanks, > Divya > >
Error :Type mismatch error when passing hdfs file path to spark-csv load method
Hi, I am trying to dynamically create Dataframe by reading subdirectories under parent directory My code looks like > import org.apache.spark._ > import org.apache.spark.sql._ > val hadoopConf = new org.apache.hadoop.conf.Configuration() > val hdfsConn = org.apache.hadoop.fs.FileSystem.get(new > java.net.URI("hdfs://xxx.xx.xx.xxx:8020"), hadoopConf) > hdfsConn.listStatus(new > org.apache.hadoop.fs.Path("/TestDivya/Spark/ParentDir/")).foreach{ > fileStatus => >val filePathName = fileStatus.getPath().toString() >val fileName = fileStatus.getPath().getName().toLowerCase() >var df = "df"+fileName >df = > sqlContext.read.format("com.databricks.spark.csv").option("header", > "true").option("inferSchema", "true").load(filePathName) > } getting below error > :35: error: type mismatch; > found : org.apache.spark.sql.DataFrame > required: String > df = > sqlContext.read.format("com.databricks.spark.csv").option("header", > "true").option("inferSchema", "true").load(filePathName) Am I missing something ? Would really appreciate the help . Thanks, Divya
Re: trouble calculating TF-IDF data type mismatch: '(tf * idf)' requires numeric type, not vector;
you¹ll need the following function if you want to run the test code Kind regards Andy private DataFrame createData(JavaRDD rdd) { StructField id = null; id = new StructField("id", DataTypes.IntegerType, false, Metadata.empty()); StructField label = null; label = new StructField("label", DataTypes.DoubleType, false, Metadata.empty()); StructField words = null; words = new StructField("words", DataTypes.createArrayType(DataTypes.StringType), false, Metadata.empty()); StructType schema = new StructType(new StructField[] { id, label, words }); DataFrame ret = sqlContext.createDataFrame(rdd, schema); return ret; } From: Andrew Davidson <a...@santacruzintegration.com> Date: Wednesday, January 13, 2016 at 2:52 PM To: "user @spark" <user@spark.apache.org> Subject: trouble calculating TF-IDF data type mismatch: '(tf * idf)' requires numeric type, not vector; > Bellow is a little snippet of my Java Test Code. Any idea how I implement > member wise vector multiplication? > > Also notice the idf value for Chinese¹ is 0.0? The calculation is ln((4+1) / > (6/4 + 1)) = ln(2) = 0.6931 ?? > > Also any idea if this code would work in a pipe line? I.E. Is the pipeline > smart about using cache() ? > > Kind regards > > Andy > > transformed df printSchema() > > root > > |-- id: integer (nullable = false) > > |-- label: double (nullable = false) > > |-- words: array (nullable = false) > > ||-- element: string (containsNull = true) > > |-- tf: vector (nullable = true) > > |-- idf: vector (nullable = true) > > > > +---+-++-+ > ---+ > > |id |label|words |tf |idf > | > > +---+-++-+ > ---+ > > |0 |0.0 |[Chinese, Beijing, Chinese] |(7,[1,2],[2.0,1.0]) > |(7,[1,2],[0.0,0.9162907318741551]) | > > |1 |0.0 |[Chinese, Chinese, Shanghai]|(7,[1,4],[2.0,1.0]) > |(7,[1,4],[0.0,0.9162907318741551]) | > > |2 |0.0 |[Chinese, Macao]|(7,[1,6],[1.0,1.0]) > |(7,[1,6],[0.0,0.9162907318741551]) | > > |3 |1.0 |[Tokyo, Japan, Chinese] > |(7,[1,3,5],[1.0,1.0,1.0])|(7,[1,3,5],[0.0,0.9162907318741551,0.91629073187415 > 51])| > > +---+-++-+ > ---+ > > > @Test > > public void test() { > > DataFrame rawTrainingDF = createTrainingData(); > > DataFrame trainingDF = runPipleLineTF_IDF(rawTrainingDF); > > . . . > > } > >private DataFrame runPipleLineTF_IDF(DataFrame rawDF) { > > HashingTF hashingTF = new HashingTF() > > .setInputCol("words") > > .setOutputCol("tf") > > .setNumFeatures(dictionarySize); > > > > DataFrame termFrequenceDF = hashingTF.transform(rawDF); > > > > termFrequenceDF.cache(); // idf needs to make 2 passes over data set > > IDFModel idf = new IDF() > > //.setMinDocFreq(1) // our vocabulary has 6 words we > hash into 7 > > .setInputCol(hashingTF.getOutputCol()) > > .setOutputCol("idf") > > .fit(termFrequenceDF); > > > > DataFrame tmp = idf.transform(termFrequenceDF); > > > > DataFrame ret = tmp.withColumn("features", > tmp.col("tf").multiply(tmp.col("idf"))); > > logger.warn("\ntransformed df printSchema()"); > > ret.printSchema(); > > ret.show(false); > > > > return ret; > > } > > > > org.apache.spark.sql.AnalysisException: cannot resolve '(tf * idf)' due to > data type mismatch: '(tf * idf)' requires numeric type, not vector; > > > > > > private DataFrame createTrainingData() { > > // make sure we only use dictionarySize words > > JavaRDD rdd = javaSparkContext.parallelize(Arrays.asList( > > // 0 is Chinese > > // 1 in notChinese > > RowFactory
trouble calculating TF-IDF data type mismatch: '(tf * idf)' requires numeric type, not vector;
Bellow is a little snippet of my Java Test Code. Any idea how I implement member wise vector multiplication? Also notice the idf value for Chinese¹ is 0.0? The calculation is ln((4+1) / (6/4 + 1)) = ln(2) = 0.6931 ?? Also any idea if this code would work in a pipe line? I.E. Is the pipeline smart about using cache() ? Kind regards Andy transformed df printSchema() root |-- id: integer (nullable = false) |-- label: double (nullable = false) |-- words: array (nullable = false) ||-- element: string (containsNull = true) |-- tf: vector (nullable = true) |-- idf: vector (nullable = true) +---+-++-+-- -+ |id |label|words |tf |idf | +---+-++-+-- -+ |0 |0.0 |[Chinese, Beijing, Chinese] |(7,[1,2],[2.0,1.0]) |(7,[1,2],[0.0,0.9162907318741551]) | |1 |0.0 |[Chinese, Chinese, Shanghai]|(7,[1,4],[2.0,1.0]) |(7,[1,4],[0.0,0.9162907318741551]) | |2 |0.0 |[Chinese, Macao]|(7,[1,6],[1.0,1.0]) |(7,[1,6],[0.0,0.9162907318741551]) | |3 |1.0 |[Tokyo, Japan, Chinese] |(7,[1,3,5],[1.0,1.0,1.0])|(7,[1,3,5],[0.0,0.9162907318741551,0.916290731874 1551])| +---+-++-+-- -+ @Test public void test() { DataFrame rawTrainingDF = createTrainingData(); DataFrame trainingDF = runPipleLineTF_IDF(rawTrainingDF); . . . } private DataFrame runPipleLineTF_IDF(DataFrame rawDF) { HashingTF hashingTF = new HashingTF() .setInputCol("words") .setOutputCol("tf") .setNumFeatures(dictionarySize); DataFrame termFrequenceDF = hashingTF.transform(rawDF); termFrequenceDF.cache(); // idf needs to make 2 passes over data set IDFModel idf = new IDF() //.setMinDocFreq(1) // our vocabulary has 6 words we hash into 7 .setInputCol(hashingTF.getOutputCol()) .setOutputCol("idf") .fit(termFrequenceDF); DataFrame tmp = idf.transform(termFrequenceDF); DataFrame ret = tmp.withColumn("features", tmp.col("tf").multiply(tmp.col("idf"))); logger.warn("\ntransformed df printSchema()"); ret.printSchema(); ret.show(false); return ret; } org.apache.spark.sql.AnalysisException: cannot resolve '(tf * idf)' due to data type mismatch: '(tf * idf)' requires numeric type, not vector; private DataFrame createTrainingData() { // make sure we only use dictionarySize words JavaRDD rdd = javaSparkContext.parallelize(Arrays.asList( // 0 is Chinese // 1 in notChinese RowFactory.create(0, 0.0, Arrays.asList("Chinese", "Beijing", "Chinese")), RowFactory.create(1, 0.0, Arrays.asList("Chinese", "Chinese", "Shanghai")), RowFactory.create(2, 0.0, Arrays.asList("Chinese", "Macao")), RowFactory.create(3, 1.0, Arrays.asList("Tokyo", "Japan", "Chinese"; return createData(rdd); } private DataFrame createTestData() { JavaRDD rdd = javaSparkContext.parallelize(Arrays.asList( // 0 is Chinese // 1 in notChinese // "bernoulli" requires label to be IntegerType RowFactory.create(4, 1.0, Arrays.asList("Chinese", "Chinese", "Chinese", "Tokyo", "Japan"; return createData(rdd); }
adding element into MutableList throws an error type mismatch
Hi All, Could someone shed a light to why when adding element into MutableList can result in type mistmatch, even if I'm sure that the class type is right? Below is the sample code I run in spark 1.0.2 console, at the end of line, there is an error type mismatch: Welcome to __ / __/__ ___ _/ /__ _\ \/ _ \/ _ `/ __/ '_/ /___/ .__/\_,_/_/ /_/\_\ version 1.0.2 /_/ Using Scala version 2.10.4 (Java HotSpot(TM) 64-Bit Server VM, Java 1.6.0_45) Type in expressions to have them evaluated. Type :help for more information. 14/10/15 14:36:39 INFO spark.SecurityManager: Changing view acls to: hadoop 14/10/15 14:36:39 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(hadoop) 14/10/15 14:36:39 INFO slf4j.Slf4jLogger: Slf4jLogger started 14/10/15 14:36:39 INFO Remoting: Starting remoting 14/10/15 14:36:39 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sp...@fphd4.ctpilot1.com:35293] 14/10/15 14:36:39 INFO Remoting: Remoting now listens on addresses: [akka.tcp://sp...@fphd4.ctpilot1.com:35293] 14/10/15 14:36:39 INFO spark.SparkEnv: Registering MapOutputTracker 14/10/15 14:36:39 INFO spark.SparkEnv: Registering BlockManagerMaster 14/10/15 14:36:39 INFO storage.DiskBlockManager: Created local directory at /tmp/spark-local-20141015143639-c62e 14/10/15 14:36:39 INFO storage.MemoryStore: MemoryStore started with capacity 294.4 MB. 14/10/15 14:36:39 INFO network.ConnectionManager: Bound socket to port 43236 with id = ConnectionManagerId(fphd4.ctpilot1.com,43236) 14/10/15 14:36:39 INFO storage.BlockManagerMaster: Trying to register BlockManager 14/10/15 14:36:39 INFO storage.BlockManagerInfo: Registering block manager fphd4.ctpilot1.com:43236 with 294.4 MB RAM 14/10/15 14:36:39 INFO storage.BlockManagerMaster: Registered BlockManager 14/10/15 14:36:39 INFO spark.HttpServer: Starting HTTP Server 14/10/15 14:36:39 INFO server.Server: jetty-8.y.z-SNAPSHOT 14/10/15 14:36:40 INFO server.AbstractConnector: Started SocketConnector@0.0.0.0:37164 14/10/15 14:36:40 INFO broadcast.HttpBroadcast: Broadcast server started at http://10.18.30.154:37164 14/10/15 14:36:40 INFO spark.HttpFileServer: HTTP File server directory is /tmp/spark-34fc70ab-7c5d-4e79-9ae7-929fd47d4f36 14/10/15 14:36:40 INFO spark.HttpServer: Starting HTTP Server 14/10/15 14:36:40 INFO server.Server: jetty-8.y.z-SNAPSHOT 14/10/15 14:36:40 INFO server.AbstractConnector: Started SocketConnector@0.0.0.0:47025 14/10/15 14:36:40 INFO server.Server: jetty-8.y.z-SNAPSHOT 14/10/15 14:36:40 INFO server.AbstractConnector: Started SelectChannelConnector@0.0.0.0:4040 14/10/15 14:36:40 INFO ui.SparkUI: Started SparkUI at http://fphd4.ctpilot1.com:4040 14/10/15 14:36:40 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 14/10/15 14:36:40 INFO executor.Executor: Using REPL class URI: http://10.18.30.154:49669 14/10/15 14:36:40 INFO repl.SparkILoop: Created spark context.. Spark context available as sc. scala case class Dummy(x: String) { | val data:String = x | } defined class Dummy scala import scala.collection.mutable.MutableList import scala.collection.mutable.MutableList scala val v = MutableList[Dummy]() v: scala.collection.mutable.MutableList[Dummy] = MutableList() scala v += (new Dummy(a)) console:16: error: type mismatch; found : Dummy required: Dummy v += (new Dummy(a)) The privileged confidential information contained in this email is intended for use only by the addressees as indicated by the original sender of this email. If you are not the addressee indicated in this email or are not responsible for delivery of the email to such a person, please kindly reply to the sender indicating this fact and delete all copies of it from your computer and network server immediately. Your cooperation is highly appreciated. It is advised that any unauthorized use of confidential information of Winbond is strictly prohibited; and any information in this email irrelevant to the official business of Winbond shall be deemed as neither given nor endorsed by Winbond.
Re: adding element into MutableList throws an error type mismatch
Another instance of https://issues.apache.org/jira/browse/SPARK-1199 , fixed in subsequent versions. On Wed, Oct 15, 2014 at 7:40 AM, Henry Hung ythu...@winbond.com wrote: Hi All, Could someone shed a light to why when adding element into MutableList can result in type mistmatch, even if I’m sure that the class type is right? Below is the sample code I run in spark 1.0.2 console, at the end of line, there is an error type mismatch: Welcome to __ / __/__ ___ _/ /__ _\ \/ _ \/ _ `/ __/ '_/ /___/ .__/\_,_/_/ /_/\_\ version 1.0.2 /_/ Using Scala version 2.10.4 (Java HotSpot(TM) 64-Bit Server VM, Java 1.6.0_45) Type in expressions to have them evaluated. Type :help for more information. 14/10/15 14:36:39 INFO spark.SecurityManager: Changing view acls to: hadoop 14/10/15 14:36:39 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(hadoop) 14/10/15 14:36:39 INFO slf4j.Slf4jLogger: Slf4jLogger started 14/10/15 14:36:39 INFO Remoting: Starting remoting 14/10/15 14:36:39 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sp...@fphd4.ctpilot1.com:35293] 14/10/15 14:36:39 INFO Remoting: Remoting now listens on addresses: [akka.tcp://sp...@fphd4.ctpilot1.com:35293] 14/10/15 14:36:39 INFO spark.SparkEnv: Registering MapOutputTracker 14/10/15 14:36:39 INFO spark.SparkEnv: Registering BlockManagerMaster 14/10/15 14:36:39 INFO storage.DiskBlockManager: Created local directory at /tmp/spark-local-20141015143639-c62e 14/10/15 14:36:39 INFO storage.MemoryStore: MemoryStore started with capacity 294.4 MB. 14/10/15 14:36:39 INFO network.ConnectionManager: Bound socket to port 43236 with id = ConnectionManagerId(fphd4.ctpilot1.com,43236) 14/10/15 14:36:39 INFO storage.BlockManagerMaster: Trying to register BlockManager 14/10/15 14:36:39 INFO storage.BlockManagerInfo: Registering block manager fphd4.ctpilot1.com:43236 with 294.4 MB RAM 14/10/15 14:36:39 INFO storage.BlockManagerMaster: Registered BlockManager 14/10/15 14:36:39 INFO spark.HttpServer: Starting HTTP Server 14/10/15 14:36:39 INFO server.Server: jetty-8.y.z-SNAPSHOT 14/10/15 14:36:40 INFO server.AbstractConnector: Started SocketConnector@0.0.0.0:37164 14/10/15 14:36:40 INFO broadcast.HttpBroadcast: Broadcast server started at http://10.18.30.154:37164 14/10/15 14:36:40 INFO spark.HttpFileServer: HTTP File server directory is /tmp/spark-34fc70ab-7c5d-4e79-9ae7-929fd47d4f36 14/10/15 14:36:40 INFO spark.HttpServer: Starting HTTP Server 14/10/15 14:36:40 INFO server.Server: jetty-8.y.z-SNAPSHOT 14/10/15 14:36:40 INFO server.AbstractConnector: Started SocketConnector@0.0.0.0:47025 14/10/15 14:36:40 INFO server.Server: jetty-8.y.z-SNAPSHOT 14/10/15 14:36:40 INFO server.AbstractConnector: Started SelectChannelConnector@0.0.0.0:4040 14/10/15 14:36:40 INFO ui.SparkUI: Started SparkUI at http://fphd4.ctpilot1.com:4040 14/10/15 14:36:40 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 14/10/15 14:36:40 INFO executor.Executor: Using REPL class URI: http://10.18.30.154:49669 14/10/15 14:36:40 INFO repl.SparkILoop: Created spark context.. Spark context available as sc. scala case class Dummy(x: String) { | val data:String = x | } defined class Dummy scala import scala.collection.mutable.MutableList import scala.collection.mutable.MutableList scala val v = MutableList[Dummy]() v: scala.collection.mutable.MutableList[Dummy] = MutableList() scala v += (new Dummy(a)) console:16: error: type mismatch; found : Dummy required: Dummy v += (new Dummy(a)) The privileged confidential information contained in this email is intended for use only by the addressees as indicated by the original sender of this email. If you are not the addressee indicated in this email or are not responsible for delivery of the email to such a person, please kindly reply to the sender indicating this fact and delete all copies of it from your computer and network server immediately. Your cooperation is highly appreciated. It is advised that any unauthorized use of confidential information of Winbond is strictly prohibited; and any information in this email irrelevant to the official business of Winbond shall be deemed as neither given nor endorsed by Winbond. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: error: type mismatch while Union
Thank you Aaron for pointing out problem. This only happens when I run this code in spark-shell but not when i submit the job. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/error-type-mismatch-while-Union-tp13547p13677.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: error: type mismatch while Union
I am using Spark version 1.0.2 -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/error-type-mismatch-while-Union-tp13547p13618.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: error: type mismatch while Union
Are you doing this from the spark-shell? You're probably running into https://issues.apache.org/jira/browse/SPARK-1199 which should be fixed in 1.1. On Sat, Sep 6, 2014 at 3:03 AM, Dhimant dhimant84.jays...@gmail.com wrote: I am using Spark version 1.0.2 -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/error-type-mismatch-while-Union-tp13547p13618.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: error: type mismatch while Union
Which version are you using -- I can reproduce your issue w/ 0.9.2 but not with 1.0.1...so my guess is that it's a bug and the fix hasn't been backported... No idea on a workaround though.. On Fri, Sep 5, 2014 at 7:58 AM, Dhimant dhimant84.jays...@gmail.com wrote: Hi, I am getting type mismatch error while union operation. Can someone suggest solution ? / case class MyNumber(no: Int, secondVal: String) extends Serializable with Ordered[MyNumber] { override def toString(): String = this.no.toString + + this.secondVal override def compare(that: MyNumber): Int = this.no compare that.no override def compareTo(that: MyNumber): Int = this.no compare that.no def Equals(that: MyNumber): Boolean = { (this.no == that.no) (that match { case MyNumber(n1, n2) = n1 == no n2 == secondVal case _ = false }) } } val numbers = sc.parallelize(1 to 20, 10) val firstRdd = numbers.map(new MyNumber(_, A)) val secondRDD = numbers.map(new MyNumber(_, B)) val numberRdd = firstRdd .union(secondRDD ) console:24: error: type mismatch; found : org.apache.spark.rdd.RDD[MyNumber] required: org.apache.spark.rdd.RDD[MyNumber] val numberRdd = onenumberRdd.union(anotherRDD)/ -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/error-type-mismatch-while-Union-tp13547.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
error: type mismatch while assigning RDD to RDD val object
I am receiving following error in Spark-Shell while executing following code. /class LogRecrod(logLine: String) extends Serializable { val splitvals = logLine.split(,); val strIp: String = splitvals(0) val hostname: String = splitvals(1) val server_name: String = splitvals(2) }/ /var logRecordRdd: org.apache.spark.rdd.RDD[LogRecrod] = _/ / val sourceFile = sc.textFile(hdfs://192.168.1.30:9000/Data/Log_1406794333258.log, 2)/ 14/09/04 12:08:28 INFO storage.MemoryStore: ensureFreeSpace(179585) called with curMem=0, maxMem=309225062 14/09/04 12:08:28 INFO storage.MemoryStore: Block broadcast_0 stored as values to memory (estimated size 175.4 KB, free 294.7 MB) sourceFile: org.apache.spark.rdd.RDD[String] = MappedRDD[1] at textFile at console:12 /scala logRecordRdd = sourceFile.map(line = new LogRecrod(line))/ /console:18: error: type mismatch; found : LogRecrod required: LogRecrod logRecordRdd = sourceFile.map(line = new LogRecrod(line))/ Any suggestions to resolve this problem? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/error-type-mismatch-while-assigning-RDD-to-RDD-val-object-tp13429.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: error: type mismatch while assigning RDD to RDD val object
I think this is a known problem with the shell and case classes. Have a look at JIRA. https://issues.apache.org/jira/browse/SPARK-1199 On Thu, Sep 4, 2014 at 7:56 AM, Dhimant dhimant84.jays...@gmail.com wrote: I am receiving following error in Spark-Shell while executing following code. /class LogRecrod(logLine: String) extends Serializable { val splitvals = logLine.split(,); val strIp: String = splitvals(0) val hostname: String = splitvals(1) val server_name: String = splitvals(2) }/ /var logRecordRdd: org.apache.spark.rdd.RDD[LogRecrod] = _/ / val sourceFile = sc.textFile(hdfs://192.168.1.30:9000/Data/Log_1406794333258.log, 2)/ 14/09/04 12:08:28 INFO storage.MemoryStore: ensureFreeSpace(179585) called with curMem=0, maxMem=309225062 14/09/04 12:08:28 INFO storage.MemoryStore: Block broadcast_0 stored as values to memory (estimated size 175.4 KB, free 294.7 MB) sourceFile: org.apache.spark.rdd.RDD[String] = MappedRDD[1] at textFile at console:12 /scala logRecordRdd = sourceFile.map(line = new LogRecrod(line))/ /console:18: error: type mismatch; found : LogRecrod required: LogRecrod logRecordRdd = sourceFile.map(line = new LogRecrod(line))/ Any suggestions to resolve this problem? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/error-type-mismatch-while-assigning-RDD-to-RDD-val-object-tp13429.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Custom Accumulator: Type Mismatch Error
Hello, I have been trying to implement a custom accumulator as below import org.apache.spark._ class VectorNew1(val data: Array[Double]) {} implicit object VectorAP extends AccumulatorParam[VectorNew1] { def zero(v: VectorNew1) = new VectorNew1(new Array(v.data.size)) def addInPlace(v1: VectorNew1, v2: VectorNew1) = { for (i - 0 to v1.data.size - 1) v1.data(i) += v2.data(i) v1 } } //Create an accumulator counter of length = Number of columns which is in turn derived from the header val actualCounters1 = sc.accumulator(new VectorNew1(Array.fill[Double](2)(0))) onlySplitFile.foreach(oneRow = { //println(Here) //println(oneRow(0)) //for(eachColumnValue - oneRow) //{ actualCounters1 += new VectorNew1(Array(1,1)) //} }) I am receiving the following error Error: type mismatch; Found : VectorNew1 Required : vectorNew1 actualCounters1 += new VectorNew1(Array(1,1)) Could someone help me with this? Thank You Vinay