You can't use these names due to limitations in parquet (and the library it
self with silently generate corrupt files that can't be read, hence the
error we throw).

You can alias a column by df.select(df("old").alias("new")), which is
essential what withColumnRenamed does.  Alias in this case means renaming.

On Thu, Jul 30, 2015 at 11:49 AM, angelini <alex.angel...@shopify.com>
wrote:

> Hi all,
>
> Our data has lots of human readable column names (names that include
> spaces), is it possible to use these with Parquet and Dataframes?
>
> When I try and write the Dataframe I get the following error:
>
> (I am using PySpark)
>
> `AnalysisException: Attribute name "Name with Space" contains invalid
> character(s) among " ,;{}()\n\t=". Please use alias to rename it.`
>
> How can I alias that column name?
>
> `df['Name with Space'] = df['Name with Space'].alias('Name')` doesn't work
> as you can't assign to a dataframe column.
>
> `df.withColumnRenamed('Name with Space', 'Name')` overwrites the column and
> doesn't alias it.
>
> Any ideas?
>
> Thanks
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Parquet-Dataframes-Column-names-with-spaces-tp24088.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>

Reply via email to