unsubscribe

2020-01-04 Thread Anton Kravchenko
unsubscribe

Re: access a broadcasted variable from within ForeachPartitionFunction Java API

2017-06-23 Thread Anton Kravchenko
think Broadcast itself can be serialized. you can get the value > out on the driver side and refer to it in foreach, then the value would be > serialized with the lambda expr and sent to workers. > > On Fri, Jun 16, 2017 at 2:29 AM, Anton Kravchenko < > kravchenko

access a broadcasted variable from within ForeachPartitionFunction Java API

2017-06-15 Thread Anton Kravchenko
How one would access a broadcasted variable from within ForeachPartitionFunction Spark(2.0.1) Java API ? Integer _bcv = 123; Broadcast bcv = spark.sparkContext().broadcast(_bcv); Dataset df_sql = spark.sql("select * from atable"); df_sql.foreachPartition(new ForeachPartitionFunction() {

Re: Java access to internal representation of DataTypes.DateType

2017-06-14 Thread Anton Kravchenko
L194 > > Kazuaki Ishizaki > > > > From:Anton Kravchenko <kravchenko.anto...@gmail.com> > To:"user @spark" <user@spark.apache.org> > Date:2017/06/14 01:16 > Subject:Java access to internal representation of > Dat

Java access to internal representation of DataTypes.DateType

2017-06-13 Thread Anton Kravchenko
How one would access to internal representation of DataTypes.DateType from Spark (2.0.1) Java API? From https://github.com/apache/spark/blob/51b1c1551d3a7147403b9e821fcc7c8f57b4824c/sql/catalyst/src/main/scala/org/apache/spark/sql/types/DateType.scala : "Internally, this is represented as the

Re: foreachPartition in Spark Java API

2017-05-30 Thread Anton Kravchenko
call(Iterator t) throws Exception{ List rows = new ArrayList(); while(t.hasNext()) { Row irow = t.next(); rows.add(irow.toString()); } System.out.println(rows.toString()); } } On Tue, May 30, 2017 at 2:01 PM, Anton Kravchenko

Re: foreachPartition in Spark Java API

2017-05-30 Thread Anton Kravchenko
} } Anton On Tue, May 30, 2017 at 10:58 AM, Anton Kravchenko < kravchenko.anto...@gmail.com> wrote: > What would be a Java equivalent of the Scala code below? > > def void_function_in_scala(ipartition: Iterator[Row]): Unit ={ > var df_rows=ArrayBuffer[

foreachPartition in Spark Java API

2017-05-30 Thread Anton Kravchenko
What would be a Java equivalent of the Scala code below? def void_function_in_scala(ipartition: Iterator[Row]): Unit ={ var df_rows=ArrayBuffer[String]() for(irow<-ipartition){ df_rows+=irow.toString } val df = spark.read.csv("file:///C:/input_data/*.csv")

[no subject]

2017-05-26 Thread Anton Kravchenko
df.rdd.foreachPartition(convert_to_sas_single_partition) def convert_to_sas_single_partition(ipartition: Iterator[Row]): Unit = { for (irow <- ipartition) {

spark rdd map error: too many arguments for unapply pattern, maximum = 22

2017-01-12 Thread Anton Kravchenko
Hi there, When I do rdd map with more than 22 columns - I get "error: too many arguments for unapply pattern, maximum = 22". scala> val rddRes=rows.map{case Row(col1,..col23) => Row(...)} Is there a known way to get around this issue? p.s. Here is a full traceback:

Re: spark reshape hive table and save to parquet

2016-12-15 Thread Anton Kravchenko
gt; Hope it will help. > > > Thanks, > Divya > > On 9 December 2016 at 00:53, Anton Kravchenko < > kravchenko.anto...@gmail.com> wrote: > >> Hello, >> >> I wonder if there is a way (preferably efficient) in Spark to reshape >> hive table and save it to pa

Re: spark reshape hive table and save to parquet

2016-12-14 Thread Anton Kravchenko
I am looking for something like: # prepare input data val input_schema = StructType(Seq( StructField("col1", IntegerType), StructField("col2", IntegerType), StructField("col3", IntegerType))) val input_data = spark.createDataFrame( sc.parallelize(Seq( Row(1, 2, 3),

spark reshape hive table and save to parquet

2016-12-08 Thread Anton Kravchenko
Hello, I wonder if there is a way (preferably efficient) in Spark to reshape hive table and save it to parquet. Here is a minimal example, input hive table: col1 col2 col3 1 2 3 4 5 6 output parquet: col1 newcol2 1 [2 3] 4 [5 6] p.s. The real input hive table has ~1000 columns. Thank you,