type mismatch

2021-09-02 Thread igyu
val schemas = createSchemas(config)
val arr = new Array[String](schemas.size())

lines.map(x => {
  val obj = JSON.parseObject(x)
  val vs = new Array[Any](schemas.size())
  for (i <- 0 until schemas.size()) {
arr(i) = schemas.get(i).name
 vs(i) = x.getString(schemas.get(i).name)
}
  }

  val seq = Seq(vs: _*)
  val record = Row.fromSeq(seq)
  record
})(Encoders.javaSerialization(Row.getClass))
  .toDF(arr: _*)

I get a error

type mismatch;
 found   : Class[?0] where type ?0 <: org.apache.spark.sql.Row.type
 required: Class[org.apache.spark.sql.Row]
})(Encoders.javaSerialization(Row.getClass))


igyu


Spark FlatMapGroupsWithStateFunction throws cannot resolve 'named_struct()' due to data type mismatch 'SerializeFromObject"

2018-09-17 Thread Kuttaiah Robin
Hello,

Am using FlatMapGroupsWithStateFunction in my spark streaming application.
FlatMapGroupsWithStateFunction
idstateUpdateFunction =
  new FlatMapGroupsWithStateFunction() {.}


SessionUpdate class is having trouble when added the highlighted code which
throws below exception; The same attribute milestones with setter/getter
has been added to SessionInfo (input class)  but it does not throw
exception there.

public static class SessionUpdate implements Serializable {

private static final long serialVersionUID = -3858977319192658483L;

*private ArrayList milestones = new
ArrayList();*

private Timestamp processingTimeoutTimestamp;

public SessionUpdate() {
  super();
}

public SessionUpdate(String instanceId, *ArrayList
milestones*, Timestamp processingTimeoutTimestamp) {
  super();
  this.instanceId = instanceId;
  *this.milestones = milestones;*
  this.processingTimeoutTimestamp = processingTimeoutTimestamp;
}

public String getInstanceId() {
  return instanceId;
}

public void setInstanceId(String instanceId) {
  this.instanceId = instanceId;
}

*public ArrayList getMilestones() {*
*  return milestones;*
*}*

*public void setMilestones(ArrayList milestones) {*
*  this.milestones = milestones;*
*}*

public Timestamp getProcessingTimeoutTimestamp() {
  return processingTimeoutTimestamp;
}

public void setProcessingTimeoutTimestamp(Timestamp
processingTimeoutTimestamp) {
  this.processingTimeoutTimestamp = processingTimeoutTimestamp;
}

}

Exception:
ERROR  cannot resolve 'named_struct()' due to data type mismatch: input to
function named_struct requires at least one argument;;
'SerializeFromObject [staticinvoke(class
org.apache.spark.unsafe.types.UTF8String, StringType, fromString,
assertnotnull(input[0,
oracle.insight.spark.event_processor.EventProcessor$SessionUpdate,
true]).getInstanceId, true, false) AS instanceId#62,
mapobjects(MapObjects_loopValue2, MapObjects_loopIsNull2, ObjectType(class
org.apache.spark.sql.catalyst.expressions.GenericRowWithSchema), if
(isnull(lambdavariable(MapObjects_loopValue2, MapObjects_loopIsNull2,
ObjectType(class
org.apache.spark.sql.catalyst.expressions.GenericRowWithSchema), true)))
null else named_struct(), assertnotnull(input[0,
oracle.insight.spark.event_processor.EventProcessor$SessionUpdate,
true]).getMilestones, None) AS milestones#63, staticinvoke(class
org.apache.spark.sql.catalyst.util.DateTimeUtils$, TimestampType,
fromJavaTimestamp, assertnotnull(input[0,
oracle.insight.spark.event_processor.EventProcessor$SessionUpdate,
true]).getProcessingTimeoutTimestamp, true, false) AS
processingTimeoutTimestamp#64]
+- FlatMapGroupsWithState , cast(value#54 as string).toString,
createexternalrow(EventTime#23.toString, InstanceID#24.toString,
Model#25.toString, Milestone#26.toString, Region#27.toString,
SalesOrganization#28.toString, ProductName#29.toString,
ReasonForQuoteReject#30.toString, ReasonforRejectionBy#31.toString,
OpportunityAmount#32.toJavaBigDecimal, Discount#33.toJavaBigDecimal,
TotalQuoteAmount#34.toJavaBigDecimal, NetQuoteAmount#35.toJavaBigDecimal,
ApprovedDiscount#36.toJavaBigDecimal, TotalOrderAmount#37.toJavaBigDecimal,
StructField(EventTime,StringType,true),
StructField(InstanceID,StringType,true),
StructField(Model,StringType,true), StructField(Milestone,StringType,true),
StructField(Region,StringType,true),
StructField(SalesOrganization,StringType,true),
StructField(ProductName,StringType,true),
StructField(ReasonForQuoteReject,StringType,true),
StructField(ReasonforRejectionBy,StringType,true), ... 6 more fields),
[value#54], [EventTime#23, InstanceID#24, Model#25, Milestone#26,
Region#27, SalesOrganization#28, ProductName#29, ReasonForQuoteReject#30,
ReasonforRejectionBy#31, OpportunityAmount#32, Discount#33,
TotalQuoteAmount#34, NetQuoteAmount#35, ApprovedDiscount#36,
TotalOrderAmount#37], obj#61:
oracle.insight.spark.event_processor.EventProcessor$SessionUpdate,
class[instanceId[0]: string, milestones[0]: array>,
processingTimeoutTimestamp[0]: timestamp], Append, false,
ProcessingTimeTimeout


Schema looks like

{"Name":"EventTime","DataType":"TimestampType"},
{"Name":"InstanceID",   "DataType":"STRING",  "Length":100},
{"Name":"Model","DataType":"STRING",  "Length":100},
{"Name":"Milestone","DataType":"STRING",  "Length":100},
{"Name":"Region",   "DataType":"STRING",  "Length":100},
{"Name":"SalesOrganization","DataType":"STRING",  "Length":100},
{"Name":"ProductName",  "DataType":"STRING",  "Length":100},
{"Name":"ReasonForQuoteReject", "DataType":"STRING",  "Length":100},
{

Re: Error :Type mismatch error when passing hdfs file path to spark-csv load method

2016-02-21 Thread Jonathan Kelly
On the line preceding the one that the compiler is complaining about (which
doesn't actually have a problem in itself), you declare df as
"df"+fileName, making it a string. Then you try to assign a DataFrame to
df, but it's already a string. I don't quite understand your intent with
that previous line, but I'm guessing you didn't mean to assign a string to
df.

~ Jonathan
On Sun, Feb 21, 2016 at 8:45 PM Divya Gehlot <divya.htco...@gmail.com>
wrote:

> Hi,
> I am trying to dynamically create Dataframe by reading subdirectories
> under parent directory
>
> My code looks like
>
>> import org.apache.spark._
>> import org.apache.spark.sql._
>> val hadoopConf = new org.apache.hadoop.conf.Configuration()
>> val hdfsConn = org.apache.hadoop.fs.FileSystem.get(new
>> java.net.URI("hdfs://xxx.xx.xx.xxx:8020"), hadoopConf)
>> hdfsConn.listStatus(new
>> org.apache.hadoop.fs.Path("/TestDivya/Spark/ParentDir/")).foreach{
>> fileStatus =>
>>val filePathName = fileStatus.getPath().toString()
>>val fileName = fileStatus.getPath().getName().toLowerCase()
>>var df =  "df"+fileName
>>df =
>> sqlContext.read.format("com.databricks.spark.csv").option("header",
>> "true").option("inferSchema", "true").load(filePathName)
>> }
>
>
> getting below error
>
>> :35: error: type mismatch;
>>  found   : org.apache.spark.sql.DataFrame
>>  required: String
>>  df =
>> sqlContext.read.format("com.databricks.spark.csv").option("header",
>> "true").option("inferSchema", "true").load(filePathName)
>
>
> Am I missing something ?
>
> Would really appreciate the help .
>
>
> Thanks,
> Divya
>
>


Error :Type mismatch error when passing hdfs file path to spark-csv load method

2016-02-21 Thread Divya Gehlot
Hi,
I am trying to dynamically create Dataframe by reading subdirectories under
parent directory

My code looks like

> import org.apache.spark._
> import org.apache.spark.sql._
> val hadoopConf = new org.apache.hadoop.conf.Configuration()
> val hdfsConn = org.apache.hadoop.fs.FileSystem.get(new
> java.net.URI("hdfs://xxx.xx.xx.xxx:8020"), hadoopConf)
> hdfsConn.listStatus(new
> org.apache.hadoop.fs.Path("/TestDivya/Spark/ParentDir/")).foreach{
> fileStatus =>
>val filePathName = fileStatus.getPath().toString()
>val fileName = fileStatus.getPath().getName().toLowerCase()
>var df =  "df"+fileName
>df =
> sqlContext.read.format("com.databricks.spark.csv").option("header",
> "true").option("inferSchema", "true").load(filePathName)
> }


getting below error

> :35: error: type mismatch;
>  found   : org.apache.spark.sql.DataFrame
>  required: String
>  df =
> sqlContext.read.format("com.databricks.spark.csv").option("header",
> "true").option("inferSchema", "true").load(filePathName)


Am I missing something ?

Would really appreciate the help .


Thanks,
Divya


Re: trouble calculating TF-IDF data type mismatch: '(tf * idf)' requires numeric type, not vector;

2016-01-13 Thread Andy Davidson
you¹ll need the following function if you want to run the test code

Kind regards 

Andy

private DataFrame createData(JavaRDD rdd) {

StructField id = null;

id = new StructField("id", DataTypes.IntegerType, false,
Metadata.empty());



StructField label = null;

label = new StructField("label", DataTypes.DoubleType, false,
Metadata.empty());

   

StructField words = null;

words = new StructField("words",
DataTypes.createArrayType(DataTypes.StringType), false, Metadata.empty());



StructType schema = new StructType(new StructField[] { id, label,
words });

DataFrame ret = sqlContext.createDataFrame(rdd, schema);



return ret;

}


From:  Andrew Davidson <a...@santacruzintegration.com>
Date:  Wednesday, January 13, 2016 at 2:52 PM
To:  "user @spark" <user@spark.apache.org>
Subject:  trouble calculating TF-IDF data type mismatch: '(tf * idf)'
requires numeric type, not vector;

> Bellow is a little snippet of my Java Test Code. Any idea how I implement
> member wise vector multiplication?
> 
> Also notice the idf value for ŒChinese¹ is 0.0? The calculation is ln((4+1) /
> (6/4 + 1)) = ln(2) = 0.6931 ??
> 
> Also any idea if this code would work in a pipe line? I.E. Is the pipeline
> smart about using cache() ?
> 
> Kind regards
> 
> Andy
> 
> transformed df printSchema()
> 
> root
> 
>  |-- id: integer (nullable = false)
> 
>  |-- label: double (nullable = false)
> 
>  |-- words: array (nullable = false)
> 
>  ||-- element: string (containsNull = true)
> 
>  |-- tf: vector (nullable = true)
> 
>  |-- idf: vector (nullable = true)
> 
> 
> 
> +---+-++-+
> ---+
> 
> |id |label|words   |tf   |idf
> |
> 
> +---+-++-+
> ---+
> 
> |0  |0.0  |[Chinese, Beijing, Chinese] |(7,[1,2],[2.0,1.0])
> |(7,[1,2],[0.0,0.9162907318741551]) |
> 
> |1  |0.0  |[Chinese, Chinese, Shanghai]|(7,[1,4],[2.0,1.0])
> |(7,[1,4],[0.0,0.9162907318741551]) |
> 
> |2  |0.0  |[Chinese, Macao]|(7,[1,6],[1.0,1.0])
> |(7,[1,6],[0.0,0.9162907318741551]) |
> 
> |3  |1.0  |[Tokyo, Japan, Chinese]
> |(7,[1,3,5],[1.0,1.0,1.0])|(7,[1,3,5],[0.0,0.9162907318741551,0.91629073187415
> 51])|
> 
> +---+-++-+
> ---+
> 
> 
> @Test
> 
> public void test() {
> 
> DataFrame rawTrainingDF = createTrainingData();
> 
> DataFrame trainingDF = runPipleLineTF_IDF(rawTrainingDF);
> 
> . . .
> 
> }
> 
>private DataFrame runPipleLineTF_IDF(DataFrame rawDF) {
> 
> HashingTF hashingTF = new HashingTF()
> 
> .setInputCol("words")
> 
> .setOutputCol("tf")
> 
> .setNumFeatures(dictionarySize);
> 
> 
> 
> DataFrame termFrequenceDF = hashingTF.transform(rawDF);
> 
> 
> 
> termFrequenceDF.cache(); // idf needs to make 2 passes over data set
> 
> IDFModel idf = new IDF()
> 
> //.setMinDocFreq(1) // our vocabulary has 6 words we
> hash into 7
> 
> .setInputCol(hashingTF.getOutputCol())
> 
> .setOutputCol("idf")
> 
> .fit(termFrequenceDF);
> 
> 
> 
> DataFrame tmp = idf.transform(termFrequenceDF);
> 
> 
> 
> DataFrame ret = tmp.withColumn("features",
> tmp.col("tf").multiply(tmp.col("idf")));
> 
> logger.warn("\ntransformed df printSchema()");
> 
> ret.printSchema();
> 
> ret.show(false);
> 
> 
> 
> return ret;
> 
> }
> 
> 
> 
> org.apache.spark.sql.AnalysisException: cannot resolve '(tf * idf)' due to
> data type mismatch: '(tf * idf)' requires numeric type, not vector;
> 
> 
> 
> 
> 
> private DataFrame createTrainingData() {
> 
> // make sure we only use dictionarySize words
> 
> JavaRDD rdd = javaSparkContext.parallelize(Arrays.asList(
> 
> // 0 is Chinese
> 
> // 1 in notChinese
> 
> RowFactory

trouble calculating TF-IDF data type mismatch: '(tf * idf)' requires numeric type, not vector;

2016-01-13 Thread Andy Davidson
Bellow is a little snippet of my Java Test Code. Any idea how I implement
member wise vector multiplication?

Also notice the idf value for ŒChinese¹ is 0.0? The calculation is ln((4+1)
/ (6/4 + 1)) = ln(2) = 0.6931 ??

Also any idea if this code would work in a pipe line? I.E. Is the pipeline
smart about using cache() ?

Kind regards

Andy

transformed df printSchema()

root

 |-- id: integer (nullable = false)

 |-- label: double (nullable = false)

 |-- words: array (nullable = false)

 ||-- element: string (containsNull = true)

 |-- tf: vector (nullable = true)

 |-- idf: vector (nullable = true)



+---+-++-+--
-+

|id |label|words   |tf   |idf
|

+---+-++-+--
-+

|0  |0.0  |[Chinese, Beijing, Chinese] |(7,[1,2],[2.0,1.0])
|(7,[1,2],[0.0,0.9162907318741551]) |

|1  |0.0  |[Chinese, Chinese, Shanghai]|(7,[1,4],[2.0,1.0])
|(7,[1,4],[0.0,0.9162907318741551]) |

|2  |0.0  |[Chinese, Macao]|(7,[1,6],[1.0,1.0])
|(7,[1,6],[0.0,0.9162907318741551]) |

|3  |1.0  |[Tokyo, Japan, Chinese]
|(7,[1,3,5],[1.0,1.0,1.0])|(7,[1,3,5],[0.0,0.9162907318741551,0.916290731874
1551])|

+---+-++-+--
-+


@Test

public void test() {

DataFrame rawTrainingDF = createTrainingData();

DataFrame trainingDF = runPipleLineTF_IDF(rawTrainingDF);

. . .

}

   private DataFrame runPipleLineTF_IDF(DataFrame rawDF) {

HashingTF hashingTF = new HashingTF()

.setInputCol("words")

.setOutputCol("tf")

.setNumFeatures(dictionarySize);



DataFrame termFrequenceDF = hashingTF.transform(rawDF);



termFrequenceDF.cache(); // idf needs to make 2 passes over data set

IDFModel idf = new IDF()

//.setMinDocFreq(1) // our vocabulary has 6 words we
hash into 7

.setInputCol(hashingTF.getOutputCol())

.setOutputCol("idf")

.fit(termFrequenceDF);



DataFrame tmp = idf.transform(termFrequenceDF);



DataFrame ret = tmp.withColumn("features",
tmp.col("tf").multiply(tmp.col("idf")));

logger.warn("\ntransformed df printSchema()");

ret.printSchema();

ret.show(false);



return ret;

}



org.apache.spark.sql.AnalysisException: cannot resolve '(tf * idf)' due to
data type mismatch: '(tf * idf)' requires numeric type, not vector;





private DataFrame createTrainingData() {

// make sure we only use dictionarySize words

JavaRDD rdd = javaSparkContext.parallelize(Arrays.asList(

// 0 is Chinese

// 1 in notChinese

RowFactory.create(0, 0.0, Arrays.asList("Chinese",
"Beijing", "Chinese")),

RowFactory.create(1, 0.0, Arrays.asList("Chinese",
"Chinese", "Shanghai")),

RowFactory.create(2, 0.0, Arrays.asList("Chinese",
"Macao")),

RowFactory.create(3, 1.0, Arrays.asList("Tokyo", "Japan",
"Chinese";

   

return createData(rdd);

}



private DataFrame createTestData() {

JavaRDD rdd = javaSparkContext.parallelize(Arrays.asList(

// 0 is Chinese

// 1 in notChinese

// "bernoulli" requires label to be IntegerType

RowFactory.create(4, 1.0, Arrays.asList("Chinese",
"Chinese", "Chinese", "Tokyo", "Japan";

return createData(rdd);

}




adding element into MutableList throws an error type mismatch

2014-10-15 Thread Henry Hung
Hi All,

Could someone shed a light to why when adding element into MutableList can 
result in type mistmatch, even if I'm sure that the class type is right?

Below is the sample code I run in spark 1.0.2 console, at the end of line, 
there is an error type mismatch:



Welcome to
    __
 / __/__  ___ _/ /__
_\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 1.0.2
  /_/

Using Scala version 2.10.4 (Java HotSpot(TM) 64-Bit Server VM, Java 1.6.0_45)
Type in expressions to have them evaluated.
Type :help for more information.
14/10/15 14:36:39 INFO spark.SecurityManager: Changing view acls to: hadoop
14/10/15 14:36:39 INFO spark.SecurityManager: SecurityManager: authentication 
disabled; ui acls disabled; users with view permissions: Set(hadoop)
14/10/15 14:36:39 INFO slf4j.Slf4jLogger: Slf4jLogger started
14/10/15 14:36:39 INFO Remoting: Starting remoting
14/10/15 14:36:39 INFO Remoting: Remoting started; listening on addresses 
:[akka.tcp://sp...@fphd4.ctpilot1.com:35293]
14/10/15 14:36:39 INFO Remoting: Remoting now listens on addresses: 
[akka.tcp://sp...@fphd4.ctpilot1.com:35293]
14/10/15 14:36:39 INFO spark.SparkEnv: Registering MapOutputTracker
14/10/15 14:36:39 INFO spark.SparkEnv: Registering BlockManagerMaster
14/10/15 14:36:39 INFO storage.DiskBlockManager: Created local directory at 
/tmp/spark-local-20141015143639-c62e
14/10/15 14:36:39 INFO storage.MemoryStore: MemoryStore started with capacity 
294.4 MB.
14/10/15 14:36:39 INFO network.ConnectionManager: Bound socket to port 43236 
with id = ConnectionManagerId(fphd4.ctpilot1.com,43236)
14/10/15 14:36:39 INFO storage.BlockManagerMaster: Trying to register 
BlockManager
14/10/15 14:36:39 INFO storage.BlockManagerInfo: Registering block manager 
fphd4.ctpilot1.com:43236 with 294.4 MB RAM
14/10/15 14:36:39 INFO storage.BlockManagerMaster: Registered BlockManager
14/10/15 14:36:39 INFO spark.HttpServer: Starting HTTP Server
14/10/15 14:36:39 INFO server.Server: jetty-8.y.z-SNAPSHOT
14/10/15 14:36:40 INFO server.AbstractConnector: Started 
SocketConnector@0.0.0.0:37164
14/10/15 14:36:40 INFO broadcast.HttpBroadcast: Broadcast server started at 
http://10.18.30.154:37164
14/10/15 14:36:40 INFO spark.HttpFileServer: HTTP File server directory is 
/tmp/spark-34fc70ab-7c5d-4e79-9ae7-929fd47d4f36
14/10/15 14:36:40 INFO spark.HttpServer: Starting HTTP Server
14/10/15 14:36:40 INFO server.Server: jetty-8.y.z-SNAPSHOT
14/10/15 14:36:40 INFO server.AbstractConnector: Started 
SocketConnector@0.0.0.0:47025
14/10/15 14:36:40 INFO server.Server: jetty-8.y.z-SNAPSHOT
14/10/15 14:36:40 INFO server.AbstractConnector: Started 
SelectChannelConnector@0.0.0.0:4040
14/10/15 14:36:40 INFO ui.SparkUI: Started SparkUI at 
http://fphd4.ctpilot1.com:4040
14/10/15 14:36:40 WARN util.NativeCodeLoader: Unable to load native-hadoop 
library for your platform... using builtin-java classes where applicable
14/10/15 14:36:40 INFO executor.Executor: Using REPL class URI: 
http://10.18.30.154:49669
14/10/15 14:36:40 INFO repl.SparkILoop: Created spark context..
Spark context available as sc.

scala case class Dummy(x: String) {
 | val data:String = x
 | }
defined class Dummy

scala import scala.collection.mutable.MutableList
import scala.collection.mutable.MutableList

scala val v = MutableList[Dummy]()
v: scala.collection.mutable.MutableList[Dummy] = MutableList()

scala v += (new Dummy(a))
console:16: error: type mismatch;
found   : Dummy
required: Dummy
  v += (new Dummy(a))


The privileged confidential information contained in this email is intended for 
use only by the addressees as indicated by the original sender of this email. 
If you are not the addressee indicated in this email or are not responsible for 
delivery of the email to such a person, please kindly reply to the sender 
indicating this fact and delete all copies of it from your computer and network 
server immediately. Your cooperation is highly appreciated. It is advised that 
any unauthorized use of confidential information of Winbond is strictly 
prohibited; and any information in this email irrelevant to the official 
business of Winbond shall be deemed as neither given nor endorsed by Winbond.


Re: adding element into MutableList throws an error type mismatch

2014-10-15 Thread Sean Owen
Another instance of https://issues.apache.org/jira/browse/SPARK-1199 ,
fixed in subsequent versions.

On Wed, Oct 15, 2014 at 7:40 AM, Henry Hung ythu...@winbond.com wrote:
 Hi All,



 Could someone shed a light to why when adding element into MutableList can
 result in type mistmatch, even if I’m sure that the class type is right?



 Below is the sample code I run in spark 1.0.2 console, at the end of line,
 there is an error type mismatch:







 Welcome to

     __

  / __/__  ___ _/ /__

 _\ \/ _ \/ _ `/ __/  '_/

/___/ .__/\_,_/_/ /_/\_\   version 1.0.2

   /_/



 Using Scala version 2.10.4 (Java HotSpot(TM) 64-Bit Server VM, Java
 1.6.0_45)

 Type in expressions to have them evaluated.

 Type :help for more information.

 14/10/15 14:36:39 INFO spark.SecurityManager: Changing view acls to: hadoop

 14/10/15 14:36:39 INFO spark.SecurityManager: SecurityManager:
 authentication disabled; ui acls disabled; users with view permissions:
 Set(hadoop)

 14/10/15 14:36:39 INFO slf4j.Slf4jLogger: Slf4jLogger started

 14/10/15 14:36:39 INFO Remoting: Starting remoting

 14/10/15 14:36:39 INFO Remoting: Remoting started; listening on addresses
 :[akka.tcp://sp...@fphd4.ctpilot1.com:35293]

 14/10/15 14:36:39 INFO Remoting: Remoting now listens on addresses:
 [akka.tcp://sp...@fphd4.ctpilot1.com:35293]

 14/10/15 14:36:39 INFO spark.SparkEnv: Registering MapOutputTracker

 14/10/15 14:36:39 INFO spark.SparkEnv: Registering BlockManagerMaster

 14/10/15 14:36:39 INFO storage.DiskBlockManager: Created local directory at
 /tmp/spark-local-20141015143639-c62e

 14/10/15 14:36:39 INFO storage.MemoryStore: MemoryStore started with
 capacity 294.4 MB.

 14/10/15 14:36:39 INFO network.ConnectionManager: Bound socket to port 43236
 with id = ConnectionManagerId(fphd4.ctpilot1.com,43236)

 14/10/15 14:36:39 INFO storage.BlockManagerMaster: Trying to register
 BlockManager

 14/10/15 14:36:39 INFO storage.BlockManagerInfo: Registering block manager
 fphd4.ctpilot1.com:43236 with 294.4 MB RAM

 14/10/15 14:36:39 INFO storage.BlockManagerMaster: Registered BlockManager

 14/10/15 14:36:39 INFO spark.HttpServer: Starting HTTP Server

 14/10/15 14:36:39 INFO server.Server: jetty-8.y.z-SNAPSHOT

 14/10/15 14:36:40 INFO server.AbstractConnector: Started
 SocketConnector@0.0.0.0:37164

 14/10/15 14:36:40 INFO broadcast.HttpBroadcast: Broadcast server started at
 http://10.18.30.154:37164

 14/10/15 14:36:40 INFO spark.HttpFileServer: HTTP File server directory is
 /tmp/spark-34fc70ab-7c5d-4e79-9ae7-929fd47d4f36

 14/10/15 14:36:40 INFO spark.HttpServer: Starting HTTP Server

 14/10/15 14:36:40 INFO server.Server: jetty-8.y.z-SNAPSHOT

 14/10/15 14:36:40 INFO server.AbstractConnector: Started
 SocketConnector@0.0.0.0:47025

 14/10/15 14:36:40 INFO server.Server: jetty-8.y.z-SNAPSHOT

 14/10/15 14:36:40 INFO server.AbstractConnector: Started
 SelectChannelConnector@0.0.0.0:4040

 14/10/15 14:36:40 INFO ui.SparkUI: Started SparkUI at
 http://fphd4.ctpilot1.com:4040

 14/10/15 14:36:40 WARN util.NativeCodeLoader: Unable to load native-hadoop
 library for your platform... using builtin-java classes where applicable

 14/10/15 14:36:40 INFO executor.Executor: Using REPL class URI:
 http://10.18.30.154:49669

 14/10/15 14:36:40 INFO repl.SparkILoop: Created spark context..

 Spark context available as sc.



 scala case class Dummy(x: String) {

  | val data:String = x

  | }

 defined class Dummy



 scala import scala.collection.mutable.MutableList

 import scala.collection.mutable.MutableList



 scala val v = MutableList[Dummy]()

 v: scala.collection.mutable.MutableList[Dummy] = MutableList()



 scala v += (new Dummy(a))

 console:16: error: type mismatch;

 found   : Dummy

 required: Dummy

   v += (new Dummy(a))


 
 The privileged confidential information contained in this email is intended
 for use only by the addressees as indicated by the original sender of this
 email. If you are not the addressee indicated in this email or are not
 responsible for delivery of the email to such a person, please kindly reply
 to the sender indicating this fact and delete all copies of it from your
 computer and network server immediately. Your cooperation is highly
 appreciated. It is advised that any unauthorized use of confidential
 information of Winbond is strictly prohibited; and any information in this
 email irrelevant to the official business of Winbond shall be deemed as
 neither given nor endorsed by Winbond.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: error: type mismatch while Union

2014-09-08 Thread Dhimant
Thank you Aaron for pointing out problem. This only happens when I run this
code in spark-shell but not when i submit the job.



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/error-type-mismatch-while-Union-tp13547p13677.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: error: type mismatch while Union

2014-09-06 Thread Dhimant
I am using Spark version 1.0.2




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/error-type-mismatch-while-Union-tp13547p13618.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: error: type mismatch while Union

2014-09-06 Thread Aaron Davidson
Are you doing this from the spark-shell? You're probably running into
https://issues.apache.org/jira/browse/SPARK-1199 which should be fixed in
1.1.


On Sat, Sep 6, 2014 at 3:03 AM, Dhimant dhimant84.jays...@gmail.com wrote:

 I am using Spark version 1.0.2




 --
 View this message in context:
 http://apache-spark-user-list.1001560.n3.nabble.com/error-type-mismatch-while-Union-tp13547p13618.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org




Re: error: type mismatch while Union

2014-09-05 Thread Yana Kadiyska
Which version are you using -- I can reproduce your issue w/ 0.9.2 but not
with 1.0.1...so my guess is that it's a bug and the fix hasn't been
backported... No idea on a workaround though..


On Fri, Sep 5, 2014 at 7:58 AM, Dhimant dhimant84.jays...@gmail.com wrote:

 Hi,
 I am getting type mismatch error while union operation.
 Can someone suggest solution ?

   / case class MyNumber(no: Int, secondVal: String) extends Serializable
 with Ordered[MyNumber] {
   override def toString(): String = this.no.toString +   +
 this.secondVal
   override def compare(that: MyNumber): Int = this.no compare that.no
   override def compareTo(that: MyNumber): Int = this.no compare
 that.no
   def Equals(that: MyNumber): Boolean = {
 (this.no == that.no)  (that match {
   case MyNumber(n1, n2) = n1 == no  n2 == secondVal
   case _ = false
 })
   }
 }
 val numbers = sc.parallelize(1 to 20, 10)
 val firstRdd = numbers.map(new MyNumber(_, A))
 val secondRDD = numbers.map(new MyNumber(_, B))
 val numberRdd = firstRdd .union(secondRDD )
 console:24: error: type mismatch;
  found   : org.apache.spark.rdd.RDD[MyNumber]
  required: org.apache.spark.rdd.RDD[MyNumber]
val numberRdd = onenumberRdd.union(anotherRDD)/



 --
 View this message in context:
 http://apache-spark-user-list.1001560.n3.nabble.com/error-type-mismatch-while-Union-tp13547.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org




error: type mismatch while assigning RDD to RDD val object

2014-09-04 Thread Dhimant
I am receiving following error in Spark-Shell while executing following code.

 /class LogRecrod(logLine: String) extends Serializable {
val splitvals = logLine.split(,);
val strIp: String = splitvals(0)
val hostname: String = splitvals(1)
val server_name: String = splitvals(2)
  }/

/var logRecordRdd: org.apache.spark.rdd.RDD[LogRecrod] = _/

/ val sourceFile =
sc.textFile(hdfs://192.168.1.30:9000/Data/Log_1406794333258.log, 2)/
14/09/04 12:08:28 INFO storage.MemoryStore: ensureFreeSpace(179585) called
with curMem=0, maxMem=309225062
14/09/04 12:08:28 INFO storage.MemoryStore: Block broadcast_0 stored as
values to memory (estimated size 175.4 KB, free 294.7 MB)
sourceFile: org.apache.spark.rdd.RDD[String] = MappedRDD[1] at textFile at
console:12


/scala logRecordRdd = sourceFile.map(line = new LogRecrod(line))/
/console:18: error: type mismatch;
 found   : LogRecrod
 required: LogRecrod
   logRecordRdd = sourceFile.map(line = new LogRecrod(line))/

Any suggestions to resolve this problem?




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/error-type-mismatch-while-assigning-RDD-to-RDD-val-object-tp13429.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: error: type mismatch while assigning RDD to RDD val object

2014-09-04 Thread Sean Owen
I think this is a known problem with the shell and case classes. Have
a look at JIRA.
https://issues.apache.org/jira/browse/SPARK-1199

On Thu, Sep 4, 2014 at 7:56 AM, Dhimant dhimant84.jays...@gmail.com wrote:
 I am receiving following error in Spark-Shell while executing following code.

  /class LogRecrod(logLine: String) extends Serializable {
 val splitvals = logLine.split(,);
 val strIp: String = splitvals(0)
 val hostname: String = splitvals(1)
 val server_name: String = splitvals(2)
   }/

 /var logRecordRdd: org.apache.spark.rdd.RDD[LogRecrod] = _/

 / val sourceFile =
 sc.textFile(hdfs://192.168.1.30:9000/Data/Log_1406794333258.log, 2)/
 14/09/04 12:08:28 INFO storage.MemoryStore: ensureFreeSpace(179585) called
 with curMem=0, maxMem=309225062
 14/09/04 12:08:28 INFO storage.MemoryStore: Block broadcast_0 stored as
 values to memory (estimated size 175.4 KB, free 294.7 MB)
 sourceFile: org.apache.spark.rdd.RDD[String] = MappedRDD[1] at textFile at
 console:12


 /scala logRecordRdd = sourceFile.map(line = new LogRecrod(line))/
 /console:18: error: type mismatch;
  found   : LogRecrod
  required: LogRecrod
logRecordRdd = sourceFile.map(line = new LogRecrod(line))/

 Any suggestions to resolve this problem?




 --
 View this message in context: 
 http://apache-spark-user-list.1001560.n3.nabble.com/error-type-mismatch-while-assigning-RDD-to-RDD-val-object-tp13429.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org


-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Custom Accumulator: Type Mismatch Error

2014-05-24 Thread Muttineni, Vinay
Hello,

I have been trying to implement a custom accumulator as below



import org.apache.spark._

  class VectorNew1(val data: Array[Double]) {}

  implicit object VectorAP extends AccumulatorParam[VectorNew1] {

def zero(v: VectorNew1) = new VectorNew1(new Array(v.data.size))

def addInPlace(v1: VectorNew1, v2: VectorNew1) = {

  for (i - 0 to v1.data.size - 1) v1.data(i) += v2.data(i)

  v1

}



  }

//Create an accumulator counter of length = Number of columns which is in 
turn derived from the header

  val actualCounters1 = sc.accumulator(new 
VectorNew1(Array.fill[Double](2)(0)))

  onlySplitFile.foreach(oneRow = {

//println(Here)

//println(oneRow(0))

//for(eachColumnValue - oneRow)

//{

  actualCounters1 += new VectorNew1(Array(1,1))

//}



  })







I am receiving the following error

Error: type mismatch;

Found : VectorNew1

Required : vectorNew1



actualCounters1 += new VectorNew1(Array(1,1))



Could someone help me with this?

Thank You

Vinay