Spark R guidelines for non-spark functions and coxph (Cox Regression for Time-Dependent Covariates)

2016-11-15 Thread pietrop
Hi all, I'm writing here after some intensive usage on pyspark and SparkSQL. I would like to use a well known function in the R world: coxph() from the survival package. >From what I understood, I can't parallelize a function like coxph() because it isn't provided with the SparkR package. In other

TaskMemoryManager: Failed to allocate a page

2016-10-27 Thread pietrop
I'm running an ETL process that joins table1 with other tables (CSV files), one table at time (for example table1 with table2, table1 with table3, and so on). The join is written inside a PostgreSQL istance using JDBC.The entire process runs successfully if I use table2, table3 and table4. If I

pyspark doesn't recognize MMM dateFormat pattern in spark.read.load() for dates like 1989Dec31 and 31Dec1989

2016-10-24 Thread pietrop
Hi there, I opened a question on StackOverflow at this link: http://stackoverflow.com/questions/40007972/pyspark-doesnt-recognize-mmm-dateformat-pattern-in-spark-read-load-for-dates?noredirect=1#comment67297930_40007972 I didn’t get any useful answer, so I’m writing here hoping that someone can