回复： [PySpark DataFrame] When a Row is not a Row

Davies Liu Tue, 12 May 2015 19:22:46 -0700

The class (called Row) for rows from Spark SQL is created on the fly, is 
different from pyspark.sql.Row (is an public API to create Row by users).


The reason we done it in this way is that we want to have better performance 
when accessing the columns. Basically, the rows are just named tuples (called 
`Row`).  

--  
Davies Liu
Sent with Sparrow (http://www.sparrowmailapp.com/?sig)

已使用 Sparrow (http://www.sparrowmailapp.com/?sig)  

在 2015年5月12日 星期二，上午4:49，Nicholas Chammas 写道：

> This is really strange.
>  
> > > > # Spark 1.3.1
> > > > print type(results)
> > > >  
> > >  
> >  
>  
> <class 'pyspark.sql.dataframe.DataFrame'>
>  
> > > > a = results.take(1)[0]
>  
> > > > print type(a)
> <class 'pyspark.sql.types.Row'>
>  
> > > > print pyspark.sql.types.Row
> <class 'pyspark.sql.types.Row'>
>  
> > > > print type(a) == pyspark.sql.types.Row
> False
> > > > print isinstance(a, pyspark.sql.types.Row)
> > >  
> >  
>  
> False
>  
> If I set a as follows, then the type checks pass fine.
>  
> a = pyspark.sql.types.Row('name')('Nick')
>  
> Is this a bug? What can I do to narrow down the source?
>  
> results is a massive DataFrame of spark-perf results.
>  
> Nick
> 
>  
>

回复： [PySpark DataFrame] When a Row is not a Row

Reply via email to