value of sc.defaultParallelism

2015-12-23 Thread Chang Ya-Hsuan
python version: 2.7.9 os: ubuntu 14.04 spark: 1.5.2 I run a standalone spark on localhost, and use the following code to access sc.defaultParallism # a.py import pyspark sc = pyspark.SparkContext() print(sc.defaultParallelism) and use the following command to submit $ spark-submit --master

confused behavior about pyspark.sql, Row, schema, and createDataFrame

2015-12-23 Thread Chang Ya-Hsuan
python version: 2.7.9 os: ubuntu 14.04 spark: 1.5.2 ``` import pyspark from pyspark.sql import Row from pyspark.sql.types import StructType, IntegerType sc = pyspark.SparkContext() sqlc = pyspark.SQLContext(sc) schema1 = StructType() \ .add('a', IntegerType()) \ .add('b', IntegerType())

Re: does spark really support label expr like && or || ?

2015-12-16 Thread Chang Ya-Hsuan
are you trying to do dataframe boolean expression? please use '&' for 'and', '|' for 'or', '~' for 'not' when building DataFrame boolean expressions. example: >>> df = sqlContext.range(10) >>> df.where( (df.id==1) | ~(df.id==1)) DataFrame[id: bigint] On Wed, Dec 16, 2015 at 4:32 PM, Allen

Failed to generate predicate Error when using dropna

2015-12-08 Thread Chang Ya-Hsuan
spark version: spark-1.5.2-bin-hadoop2.6 python version: 2.7.9 os: ubuntu 14.04 code to reproduce error # write.py import pyspark sc = pyspark.SparkContext() sqlc = pyspark.SQLContext(sc) df = sqlc.range(10) df1 = df.withColumn('a', df['id'] * 2) df1.write.partitionBy('id').parquet('./data')

Re: Failed to generate predicate Error when using dropna

2015-12-08 Thread Chang Ya-Hsuan
https://issues.apache.org/jira/browse/SPARK-12231 this is my first time to create JIRA ticket. is this ticket proper? thanks On Tue, Dec 8, 2015 at 9:59 PM, Reynold Xin <r...@databricks.com> wrote: > Can you create a JIRA ticket for this? Thanks. > > > On Tue, Dec 8, 2015 at

Re: pyspark with pypy not work for spark-1.5.1

2015-11-06 Thread Chang Ya-Hsuan
wrote: > You could try running PySpark's own unit tests. Try ./python/run-tests > --help for instructions. > > On Thu, Nov 5, 2015 at 12:31 AM Chang Ya-Hsuan <sumti...@gmail.com> wrote: > >> I've test on following pypy version against to spark-1.5.1 >> >>

Re: pyspark with pypy not work for spark-1.5.1

2015-11-05 Thread Chang Ya-Hsuan
helping to > investigate so that we can update the documentation or produce a fix to > restore compatibility with earlier PyPy builds? > > On Wed, Nov 4, 2015 at 11:56 PM, Chang Ya-Hsuan <sumti...@gmail.com> > wrote: > >> Hi all, >> >> I am trying to run pyspar

Re: pyspark with pypy not work for spark-1.5.1

2015-11-05 Thread Chang Ya-Hsuan
to run advanced test? On Thu, Nov 5, 2015 at 4:14 PM, Chang Ya-Hsuan <sumti...@gmail.com> wrote: > Thanks for your quickly reply. > > I will test several pypy versions and report the result later. > > On Thu, Nov 5, 2015 at 4:06 PM, Josh Rosen <rosenvi...@gmail.co

pyspark with pypy not work for spark-1.5.1

2015-11-04 Thread Chang Ya-Hsuan
Hi all, I am trying to run pyspark with pypy, and it is work when using spark-1.3.1 but failed when using spark-1.4.1 and spark-1.5.1 my pypy version: $ /usr/bin/pypy --version Python 2.7.3 (2.2.1+dfsg-1ubuntu0.3, Sep 30 2015, 15:18:40) [PyPy 2.2.1 with GCC 4.8.4] works with spark-1.3.1 $