For cast, you can use selectExpr method. For example,
df.selectExpr(cast(col1 as int) as col1, cast(col2 as bigint) as col2).
Or, df.select(df(colA).cast(int), ...)
On Thu, Apr 2, 2015 at 8:33 PM, Michael Armbrust mich...@databricks.com
wrote:
val df = Seq((test, 1)).toDF(col1, col2)
You can use SQL style expressions as a string:
df.filter(col1 IS NOT NULL).collect()
res1: Array[org.apache.spark.sql.Row] = Array([test,1])
Or you can also reference columns using df(colName) or quot;colName or
col(colName)
df.filter(df(col1) === test).collect()
res2: Array[org.apache.spark.sql.Row] = Array([test,1])
On Thu, Apr 2, 2015 at 7:45 PM, Yana Kadiyska yana.kadiy...@gmail.com
wrote:
Hi folks, having some seemingly noob issues with the dataframe API.
I have a DF which came from the csv package.
1. What would be an easy way to cast a column to a given type -- my DF
columns are all typed as strings coming from a csv. I see a schema getter
but not setter on DF
2. I am trying to use the syntax used in various blog posts but can't
figure out how to reference a column by name:
scala df.filter(customer_id!=)
console:23: error: overloaded method value filter with alternatives:
(conditionExpr: String)org.apache.spark.sql.DataFrame and
(condition: org.apache.spark.sql.Column)org.apache.spark.sql.DataFrame
cannot be applied to (Boolean)
df.filter(customer_id!=)
3. what would be the recommended way to drop a row containing a null
value -- is it possible to do this:
scala df.filter(customer_id IS NOT NULL)