Re: Koalas show data in IDE or pyspark

2019-04-30 Thread Manu Zhang
Hi,

It seems koalas.DataFrame can't be displayed in terminal yet as in
https://github.com/databricks/koalas/issues/150 and the work around is
to convert it to pandas DataFrame.

Thanks,
Manu Zhang

On Tue, Apr 30, 2019 at 2:46 PM Achilleus 003 
wrote:

> Hello Everyone,
>
> I have been trying to run *koalas* on both pyspark and pyCharm IDE.
>
> When I run
>
> df = koalas.DataFrame({‘x’: [1, 2], ‘y’: [3, 4], ‘z’: [5, 6]})
> df.head(5)
>
> I don't get the data back instead, I get an object.
> 
>
> I thought df.head can be used to achieve this.
>
> Can anyone guide me on how we can print something on the terminal?
> Something similar to df.show() in spark.
>
>


Re: [GraphX] Preserving Partitions when reading from HDFS

2019-04-15 Thread Manu Zhang
You may try
`sparkContext.hadoopConfiguration().set("mapred.max.split.size",
"33554432")` to tune the partition size when reading from HDFS.

Thanks,
Manu Zhang

On Mon, Apr 15, 2019 at 11:28 PM M Bilal  wrote:

> Hi,
>
> I have implemented a custom partitioning algorithm to partition graphs in
> GraphX. Saving the partitioning graph (the edges) to HDFS creates separate
> files in the output folder with the number of files equal to the number of
> Partitions.
>
> However, reading back the edges creates number of partitions that are
> equal to the number of blocks in the HDFS folder. Is there a way to instead
> create the same number of partitions as the number of files written to HDFS
> while preserving the original partitioning?
>
> I would like to avoid repartitioning.
>
> Thanks.
> - Bilal
>


Spark driver crashed with internal error

2019-04-07 Thread Manu Zhang
Hi all,

Recently, our Spark application's (2.3.1) driver has been crashing before
exiting with the following error.

* Could not load hsdis-amd64.so; library not loadable; PrintAssembly is
disabled*
* #*
* # A fatal error has been detected by the Java Runtime Environment:*
* #*
* #  Internal Error (sharedRuntime.cpp:834), pid=40111,
tid=0x2ac46180a700*
* #  fatal error: exception happened outside interpreter, nmethods and
vtable stubs at pc 0x2ac1a832edb1*
* #*
* # JRE version: Java(TM) SE Runtime Environment (8.0_131-b11) (build
1.8.0_131-b11)*
* # Java VM: Java HotSpot(TM) 64-Bit Server VM (25.131-b11 mixed mode
linux-amd64 compressed oops)*

Similar errors were reported in older JVM and Linux  kernel versions
https://bugs.openjdk.java.net/browse/JDK-8203612 but we are running on Java
1.8.0_131 and kernel 3.10.0-693.21.1.el7.x86_64.

Here is the stack in error report.

*Current thread (0x2ac438005000):  JavaThread
"block-manager-slave-async-thread-pool-2" daemon [_thread_in_Java,
id=40362, stack(0x2ac46170a000,0x2ac46180b000)]*

*Stack: [0x2ac46170a000,0x2ac46180b000],  sp=0x2ac461808cd0,
free space=1019k*
*Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native
code)*
*V  [libjvm.so+0xac826a]  VMError::report_and_die()+0x2ba*
*V  [libjvm.so+0x4fd089]  report_fatal(char const*, int, char const*)+0x59*
*V  [libjvm.so+0x9c391a]
SharedRuntime::continuation_for_implicit_exception(JavaThread*, unsigned
char*, SharedRuntime::ImplicitExceptionKind)+0x33a*
*V  [libjvm.so+0x92bbfa]  JVM_handle_linux_signal+0x48a*
*V  [libjvm.so+0x921e13]  signalHandler(int, siginfo*, void*)+0x43*
*C  [libpthread.so.0+0xf5d0]*
*j  java.util.concurrent.ThreadPoolExecutor$Worker.run()V+5*
*j  java.lang.Thread.run()V+11*
*v  ~StubRoutines::call_stub*
*V  [libjvm.so+0x691d16]  JavaCalls::call_helper(JavaValue*, methodHandle*,
JavaCallArguments*, Thread*)+0x1056*
*V  [libjvm.so+0x692221]  JavaCalls::call_virtual(JavaValue*, KlassHandle,
Symbol*, Symbol*, JavaCallArguments*, Thread*)+0x321*
*V  [libjvm.so+0x6926c7]  JavaCalls::call_virtual(JavaValue*, Handle,
KlassHandle, Symbol*, Symbol*, Thread*)+0x47*
*V  [libjvm.so+0x72da50]  thread_entry(JavaThread*, Thread*)+0xa0*
*V  [libjvm.so+0xa76833]  JavaThread::thread_main_inner()+0x103*
*V  [libjvm.so+0xa7697c]  JavaThread::run()+0x11c*
*V  [libjvm.so+0x927568]  java_start(Thread*)+0x108*
*C  [libpthread.so.0+0x7dd5]*

Has anyone seen this kind of error before ? I could provide more
information if needed.

Thanks,
Manu Zhang


Re: mapreduce.input.fileinputformat.split.maxsize not working for spark 2.4.0

2019-02-24 Thread Manu Zhang
Is your application using Spark SQL / DataFrame API ? Is so, please try setting

spark.sql.files.maxPartitionBytes

to a larger value which is 128MB by default.

Thanks,
Manu Zhang
On Feb 25, 2019, 2:58 AM +0800, Akshay Mendole , wrote:
> Hi,
>    We have dfs.blocksize configured to be 512MB  and we have some large files 
> in hdfs that we want to process with spark application. We want to split the 
> files get more splits to optimise for memory but the above mentioned 
> parameters are not working
> The max and min size params as below are configured to be 50MB still a file 
> which is as big as 500MB is read as one split while it is expected to split 
> into at least 10 input splits
> SparkConf conf = new SparkConf().setAppName(jobName);
>
> SparkContext sparkContext = new SparkContext(conf);
> sparkContext.hadoopConfiguration().set("mapreduce.input.fileinputformat.split.maxsize",
>  "5000");
> sparkContext.hadoopConfiguration().set("mapreduce.input.fileinputformat.split.minsize",
>  "5000");
> JavaSparkContext sc = new JavaSparkContext(sparkContext);
> sc.hadoopConfiguration().set("io.compression.codecs", 
> "com.hadoop.compression.lzo.LzopCodec");
>
> Could you please suggest what could be wrong with my configuration?
>
> Thanks,
> Akshay
>


Re: getting error: value toDF is not a member of Seq[columns]

2018-09-05 Thread Manu Zhang
Have you tried adding Encoder for columns as suggested by Jungtaek Lim ?

On Thu, Sep 6, 2018 at 6:24 AM Mich Talebzadeh 
wrote:

>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> *
>
>
>
> http://talebzadehmich.wordpress.com
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
> I can rebuild the comma separated list as follows:
>
>
>case class columns(KEY: String, TICKER: String, TIMEISSUED: String,
> PRICE: Float)
> val sqlContext= new org.apache.spark.sql.SQLContext(sparkContext)
> import sqlContext.implicits._
>
>
>  for(line <- pricesRDD.collect.toArray)
>  {
>var key = line._2.split(',').view(0).toString
>var ticker =  line._2.split(',').view(1).toString
>var timeissued = line._2.split(',').view(2).toString
>var price = line._2.split(',').view(3).toFloat
>var allInOne = key+","+ticker+","+timeissued+","+price
>println(allInOne)
>
> and the print shows the columns separated by ","
>
>
> 34e07d9f-829a-446a-93ab-8b93aa8eda41,SAP,2018-09-05T23:22:34,56.89
>
> So I just need to convert that line of rowinto a DataFrame
>
> I try this conversion to DF to write to MongoDB document with 
> MongoSpark.save(df,
> writeConfig)
>
> var df = sparkContext.parallelize(Seq(columns(key, ticker, timeissued,
> price))).toDF
>
> [error]
> /data6/hduser/scala/md_streaming_mongoDB/src/main/scala/myPackage/md_streaming_mongoDB.scala:235:
> value toDF is not a member of org.apache.spark.rdd.RDD[columns]
> [error] var df = sparkContext.parallelize(Seq(columns(key,
> ticker, timeissued, price))).toDF
> [
>
>
> frustrating!
>
>  has anyone come across this?
>
> thanks
>
> On Wed, 5 Sep 2018 at 13:30, Mich Talebzadeh 
> wrote:
>
>> yep already tried it and it did not work.
>>
>> thanks
>>
>> Dr Mich Talebzadeh
>>
>>
>>
>> LinkedIn * 
>> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>> *
>>
>>
>>
>> http://talebzadehmich.wordpress.com
>>
>>
>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>> any loss, damage or destruction of data or any other property which may
>> arise from relying on this email's technical content is explicitly
>> disclaimed. The author will in no case be liable for any monetary damages
>> arising from such loss, damage or destruction.
>>
>>
>>
>>
>> On Wed, 5 Sep 2018 at 10:10, Deepak Sharma  wrote:
>>
>>> Try this:
>>>
>>> *import **spark*.implicits._
>>>
>>> df.toDF()
>>>
>>>
>>> On Wed, Sep 5, 2018 at 2:31 PM Mich Talebzadeh <
>>> mich.talebza...@gmail.com> wrote:
>>>
 With the following

 case class columns(KEY: String, TICKER: String, TIMEISSUED: String,
 PRICE: Float)

  var key = line._2.split(',').view(0).toString
  var ticker =  line._2.split(',').view(1).toString
  var timeissued = line._2.split(',').view(2).toString
  var price = line._2.split(',').view(3).toFloat

   var df = Seq(columns(key, ticker, timeissued, price))
  println(df)

 I get


 List(columns(ac11a78d-82df-4b37-bf58-7e3388aa64cd,MKS,2018-09-05T10:10:15,676.5))

 So just need to convert that list to DF

 Dr Mich Talebzadeh



 LinkedIn * 
 https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
 *



 http://talebzadehmich.wordpress.com


 *Disclaimer:* Use it at your own risk. Any and all responsibility for
 any loss, damage or destruction of data or any other property which may
 arise from relying on this email's technical content is explicitly
 disclaimed. The author will in no case be liable for any monetary damages
 arising from such loss, damage or destruction.




 On Wed, 5 Sep 2018 at 09:49, Mich Talebzadeh 
 wrote:

> Thanks!
>
> The spark  is version 2.3.0
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> *
>
>
>
> http://talebzadehmich.wordpress.com
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for
> any loss, damage or destruction of data or any other property which may
> arise 

Re: java.lang.UnsupportedOperationException: No Encoder found for Set[String]

2018-08-16 Thread Manu Zhang
You may try applying this PR  https://github.com/apache/spark/pull/18416.

On Fri, Aug 17, 2018 at 9:13 AM Venkat Dabri  wrote:

> We are using spark 2.2.0. Is it possible to bring the
> ExpressionEncoder from 2.3.0 and related classes into my code base and
> use them? I see the changes in ExpressionEncoder between 2.3.0 and
> 2.2.0 is not much but there might be many other classes underneath
> that might have changed.
>
> On Thu, Aug 16, 2018 at 5:23 AM, Manu Zhang 
> wrote:
> > Hi,
> >
> > It's added since Spark 2.3.0.
> >
> https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/SQLImplicits.scala#L180
> >
> > Regards,
> > Manu Zhang
> >
> > On Thu, Aug 16, 2018 at 9:59 AM V0lleyBallJunki3 
> > wrote:
> >>
> >> Hello,
> >>   I am using Spark 2.2.2 with Scala 2.11.8. I wrote a short program
> >>
> >> val spark = SparkSession.builder().master("local[4]").getOrCreate()
> >>
> >> case class TestCC(i: Int, ss: Set[String])
> >>
> >> import spark.implicits._
> >> import spark.sqlContext.implicits._
> >>
> >> val testCCDS = Seq(TestCC(1,Set("SS","Salil")), TestCC(2, Set("xx",
> >> "XYZ"))).toDS()
> >>
> >>
> >> I get :
> >> java.lang.UnsupportedOperationException: No Encoder found for
> Set[String]
> >> - field (class: "scala.collection.immutable.Set", name: "ss")
> >> - root class: "TestCC"
> >>   at
> >>
> >>
> org.apache.spark.sql.catalyst.ScalaReflection$$anonfun$org$apache$spark$sql$catalyst$ScalaReflection$$serializerFor$1.apply(ScalaReflection.scala:632)
> >>   at
> >>
> >>
> org.apache.spark.sql.catalyst.ScalaReflection$$anonfun$org$apache$spark$sql$catalyst$ScalaReflection$$serializerFor$1.apply(ScalaReflection.scala:455)
> >>   at
> >>
> >>
> scala.reflect.internal.tpe.TypeConstraints$UndoLog.undo(TypeConstraints.scala:56)
> >>   at
> >>
> >>
> org.apache.spark.sql.catalyst.ScalaReflection$class.cleanUpReflectionObjects(ScalaReflection.scala:809)
> >>   at
> >>
> >>
> org.apache.spark.sql.catalyst.ScalaReflection$.cleanUpReflectionObjects(ScalaReflection.scala:39)
> >>   at
> >>
> >>
> org.apache.spark.sql.catalyst.ScalaReflection$.org$apache$spark$sql$catalyst$ScalaReflection$$serializerFor(ScalaReflection.scala:455)
> >>   at
> >>
> >>
> org.apache.spark.sql.catalyst.ScalaReflection$$anonfun$org$apache$spark$sql$catalyst$ScalaReflection$$serializerFor$1$$anonfun$10.apply(ScalaReflection.scala:626)
> >>   at
> >>
> >>
> org.apache.spark.sql.catalyst.ScalaReflection$$anonfun$org$apache$spark$sql$catalyst$ScalaReflection$$serializerFor$1$$anonfun$10.apply(ScalaReflection.scala:614)
> >>   at
> >>
> >>
> scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
> >>
> >> To the best of my knowledge implicit support for Set has been added in
> >> Spark
> >> 2.2. Am I missing something?
> >>
> >>
> >>
> >> --
> >> Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
> >>
> >> -
> >> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
> >>
> >
>


Re: Unable to see completed application in Spark 2 history web UI

2018-08-16 Thread Manu Zhang
Hi Fawze,

Sorry but I'm not familiar with CM. Maybe you can look into the logs (or
turn on DEBUG log).

On Thu, Aug 16, 2018 at 3:05 PM Fawze Abujaber  wrote:

> Hi Manu,
>
> I'm using cloudera manager with single user mode and every process is
> running with cloudera-scm user, the cloudera-scm is a super user and this
> is why i was confused how it worked in spark 1.6 and not in spark 2.3
>
>
> On Thu, Aug 16, 2018 at 5:34 AM Manu Zhang 
> wrote:
>
>> If you are able to log onto the node where UI has been launched, then try
>> `ps -aux | grep HistoryServer` and the first column of output should be the
>> user.
>>
>> On Wed, Aug 15, 2018 at 10:26 PM Fawze Abujaber 
>> wrote:
>>
>>> Thanks Manu, Do you know how i can see which user the UI is running,
>>> because i'm using cloudera manager and i created a user for cloudera
>>> manager and called it spark but this didn't solve me issue and here i'm
>>> trying to find out the user for the spark hisotry UI.
>>>
>>> On Wed, Aug 15, 2018 at 5:11 PM Manu Zhang 
>>> wrote:
>>>
>>>> Hi Fawze,
>>>>
>>>> A) The file permission is currently hard coded to 770 (
>>>> https://github.com/apache/spark/blob/branch-2.3/core/src/main/scala/org/apache/spark/scheduler/EventLoggingListener.scala#L287
>>>> ).
>>>> B) I think add all users (including UI) to the group like Spark will do.
>>>>
>>>>
>>>> On Wed, Aug 15, 2018 at 6:38 PM Fawze Abujaber 
>>>> wrote:
>>>>
>>>>> Hi Manu,
>>>>>
>>>>> Thanks for your response.
>>>>>
>>>>> Yes, i see but still interesting to know how i can see these
>>>>> applications from the spark history UI.
>>>>>
>>>>> How i can know with which user i'm  logged in when i'm navigating the
>>>>> spark history UI.
>>>>>
>>>>> The Spark process is running with cloudera-scm and the events written
>>>>> in the spark2history folder at the HDFS written with the user name who is
>>>>> running the application and group spark (770 permissions).
>>>>>
>>>>> I'm interesting to see if i can force these logs to be written with
>>>>> 774 or 775 permission or finding another solutions that enable Rnd or
>>>>> anyone to be able to investigate his application logs using the UI.
>>>>>
>>>>> for example : can i use such spark conf :
>>>>> spark.eventLog.permissions=755
>>>>>
>>>>> The 2 options i see here:
>>>>>
>>>>> A) find a way to enforce these logs to be written with other
>>>>> permissions.
>>>>>
>>>>> B) Find the user that the UI running with as creating LDAP groups and
>>>>> user that can handle this.
>>>>>
>>>>> for example creating group called Spark and create the user that the
>>>>> UI running with and add this user to the spark group.
>>>>> not sure if this option will work as i don't know if these steps
>>>>> authenticate against the LDAP.
>>>>>
>>>>
>>>
>>> --
>>> Take Care
>>> Fawze Abujaber
>>>
>>
>
> --
> Take Care
> Fawze Abujaber
>


Re: java.lang.UnsupportedOperationException: No Encoder found for Set[String]

2018-08-16 Thread Manu Zhang
Hi,

It's added since Spark 2.3.0.
https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/SQLImplicits.scala#L180

Regards,
Manu Zhang

On Thu, Aug 16, 2018 at 9:59 AM V0lleyBallJunki3 
wrote:

> Hello,
>   I am using Spark 2.2.2 with Scala 2.11.8. I wrote a short program
>
> val spark = SparkSession.builder().master("local[4]").getOrCreate()
>
> case class TestCC(i: Int, ss: Set[String])
>
> import spark.implicits._
> import spark.sqlContext.implicits._
>
> val testCCDS = Seq(TestCC(1,Set("SS","Salil")), TestCC(2, Set("xx",
> "XYZ"))).toDS()
>
>
> I get :
> java.lang.UnsupportedOperationException: No Encoder found for Set[String]
> - field (class: "scala.collection.immutable.Set", name: "ss")
> - root class: "TestCC"
>   at
>
> org.apache.spark.sql.catalyst.ScalaReflection$$anonfun$org$apache$spark$sql$catalyst$ScalaReflection$$serializerFor$1.apply(ScalaReflection.scala:632)
>   at
>
> org.apache.spark.sql.catalyst.ScalaReflection$$anonfun$org$apache$spark$sql$catalyst$ScalaReflection$$serializerFor$1.apply(ScalaReflection.scala:455)
>   at
>
> scala.reflect.internal.tpe.TypeConstraints$UndoLog.undo(TypeConstraints.scala:56)
>   at
>
> org.apache.spark.sql.catalyst.ScalaReflection$class.cleanUpReflectionObjects(ScalaReflection.scala:809)
>   at
>
> org.apache.spark.sql.catalyst.ScalaReflection$.cleanUpReflectionObjects(ScalaReflection.scala:39)
>   at
>
> org.apache.spark.sql.catalyst.ScalaReflection$.org$apache$spark$sql$catalyst$ScalaReflection$$serializerFor(ScalaReflection.scala:455)
>   at
>
> org.apache.spark.sql.catalyst.ScalaReflection$$anonfun$org$apache$spark$sql$catalyst$ScalaReflection$$serializerFor$1$$anonfun$10.apply(ScalaReflection.scala:626)
>   at
>
> org.apache.spark.sql.catalyst.ScalaReflection$$anonfun$org$apache$spark$sql$catalyst$ScalaReflection$$serializerFor$1$$anonfun$10.apply(ScalaReflection.scala:614)
>   at
>
> scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
>
> To the best of my knowledge implicit support for Set has been added in
> Spark
> 2.2. Am I missing something?
>
>
>
> --
> Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
>
> -
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>
>


Re: Unable to see completed application in Spark 2 history web UI

2018-08-15 Thread Manu Zhang
If you are able to log onto the node where UI has been launched, then try
`ps -aux | grep HistoryServer` and the first column of output should be the
user.

On Wed, Aug 15, 2018 at 10:26 PM Fawze Abujaber  wrote:

> Thanks Manu, Do you know how i can see which user the UI is running,
> because i'm using cloudera manager and i created a user for cloudera
> manager and called it spark but this didn't solve me issue and here i'm
> trying to find out the user for the spark hisotry UI.
>
> On Wed, Aug 15, 2018 at 5:11 PM Manu Zhang 
> wrote:
>
>> Hi Fawze,
>>
>> A) The file permission is currently hard coded to 770 (
>> https://github.com/apache/spark/blob/branch-2.3/core/src/main/scala/org/apache/spark/scheduler/EventLoggingListener.scala#L287
>> ).
>> B) I think add all users (including UI) to the group like Spark will do.
>>
>>
>> On Wed, Aug 15, 2018 at 6:38 PM Fawze Abujaber  wrote:
>>
>>> Hi Manu,
>>>
>>> Thanks for your response.
>>>
>>> Yes, i see but still interesting to know how i can see these
>>> applications from the spark history UI.
>>>
>>> How i can know with which user i'm  logged in when i'm navigating the
>>> spark history UI.
>>>
>>> The Spark process is running with cloudera-scm and the events written in
>>> the spark2history folder at the HDFS written with the user name who is
>>> running the application and group spark (770 permissions).
>>>
>>> I'm interesting to see if i can force these logs to be written with 774
>>> or 775 permission or finding another solutions that enable Rnd or anyone to
>>> be able to investigate his application logs using the UI.
>>>
>>> for example : can i use such spark conf : spark.eventLog.permissions=755
>>>
>>> The 2 options i see here:
>>>
>>> A) find a way to enforce these logs to be written with other permissions.
>>>
>>> B) Find the user that the UI running with as creating LDAP groups and
>>> user that can handle this.
>>>
>>> for example creating group called Spark and create the user that the UI
>>> running with and add this user to the spark group.
>>> not sure if this option will work as i don't know if these steps
>>> authenticate against the LDAP.
>>>
>>
>
> --
> Take Care
> Fawze Abujaber
>


Re: Unable to see completed application in Spark 2 history web UI

2018-08-15 Thread Manu Zhang
Hi Fawze,

A) The file permission is currently hard coded to 770 (
https://github.com/apache/spark/blob/branch-2.3/core/src/main/scala/org/apache/spark/scheduler/EventLoggingListener.scala#L287
).
B) I think add all users (including UI) to the group like Spark will do.


On Wed, Aug 15, 2018 at 6:38 PM Fawze Abujaber  wrote:

> Hi Manu,
>
> Thanks for your response.
>
> Yes, i see but still interesting to know how i can see these applications
> from the spark history UI.
>
> How i can know with which user i'm  logged in when i'm navigating the
> spark history UI.
>
> The Spark process is running with cloudera-scm and the events written in
> the spark2history folder at the HDFS written with the user name who is
> running the application and group spark (770 permissions).
>
> I'm interesting to see if i can force these logs to be written with 774 or
> 775 permission or finding another solutions that enable Rnd or anyone to be
> able to investigate his application logs using the UI.
>
> for example : can i use such spark conf : spark.eventLog.permissions=755
>
> The 2 options i see here:
>
> A) find a way to enforce these logs to be written with other permissions.
>
> B) Find the user that the UI running with as creating LDAP groups and user
> that can handle this.
>
> for example creating group called Spark and create the user that the UI
> running with and add this user to the spark group.
> not sure if this option will work as i don't know if these steps
> authenticate against the LDAP.
>


Re: Unable to see completed application in Spark 2 history web UI

2018-08-15 Thread Manu Zhang
Hi Fawze,

In Spark 2.3, HistoryServer will check for file permissions when reading
event logs written by your applications. (Please check
https://issues.apache.org/jira/browse/SPARK-20172). With file permissions
of 770, HistoryServer is not permitted to read the event log. That's why
you were able to see applications once changing file permissions to 777.

Regards,
Manu Zhang

On Mon, Aug 13, 2018 at 4:53 PM Fawze Abujaber  wrote:

> Hi Guys,
>
> Any help here?
>
> On Wed, Aug 8, 2018 at 7:56 AM Fawze Abujaber  wrote:
>
>> Hello Community,
>>
>> I'm using Spark 2.3 and Spark 1.6.0 in my cluster with Cloudera
>> distribution 5.13.0.
>>
>> Both are configured to run on Yarn, but i'm unable to see completed
>> application in Spark2 history server, while in Spark 1.6.0 i did.
>>
>> 1) I checked the HDFS permissions for both folders and both have the same
>> permissions.
>>
>> drwxrwxrwt   - cloudera-scm spark  0 2018-08-08 00:46
>> /user/spark/applicationHistory
>> drwxrwxrwt   - cloudera-scm spark  0 2018-08-08 00:46
>> /user/spark/spark2ApplicationHistory
>>
>> The applications file itself running with permissions 770 in both.
>>
>> -rwxrwx---   3  fawzea spark 4743751 2018-08-07 23:32
>> /user/spark/spark2ApplicationHistory/application_1527404701551_672816_1
>> -rwxrwx---   3  fawzea spark   134315 2018-08-08 00:41
>> /user/spark/applicationHistory/application_1527404701551_673359_1
>>
>> 2) No error in the Spark2 history server log.
>>
>> 3) Compared the configurations between Spark 1.6 and Spark 2.3 like
>> system user, enable log, etc ... all looks the same.
>>
>> 4) Once i changed the permissions for the above Spark2 applications to
>> 777, i was able to see the application in the spark2 history server UI.
>>
>> Tried to figure out if the 2 Sparks UIs running with different users but
>> was unable to find it.
>>
>> Anyone who ran into this issue and solved it?
>>
>> Thanks in advance.
>>
>>
>> --
>> Take Care
>> Fawze Abujaber
>>
>
>
> --
> Take Care
> Fawze Abujaber
>


Re: Split a row into multiple rows Java

2018-08-08 Thread Manu Zhang
The following may help although in Scala. The idea is to firstly concat
each value with time, assembly all time_value into an array and explode,
and finally split time_value into time and value.

 val ndf = df.select(col("name"), col("otherName"),
explode(
  array(concat_ws(":", col("v1"), lit("v1")).alias("v1"),
concat_ws(":", col("v2"), lit("v2")).alias("v2"),
concat_ws(":", col("v3"), lit("v3")).alias("v3"))
).alias("temp"))

  val fields = split(col("temp"), ":")
  ndf.select(col("name"), col("otherName"),
fields.getItem(1).alias("time"),
fields.getItem(0).alias("value"))

Regards,
Manu Zhang

On Wed, Aug 8, 2018 at 11:41 AM nookala  wrote:

> +-+-++++
> | name|otherName|val1|val2|val3|
> +-+-++++
> |  bob|   b1|   1|   2|   3|
> |alive|   c1|   3|   4|   6|
> |  eve|   e1|   7|   8|   9|
> +-+-++++
>
> I need this to become
>
> +-+-++-
> | name|otherName|time|value
> +-+-++-
> |  bob|   b1|   val1|1
> |  bob|   b1|   val2|2
> |  bob|   b1|   val3|3
> |alive|   c1|   val1| 3
> |alive|   c1|   val2| 4
> |alive|   c1|   val3| 6
> |  eve|   e1|   val1|7
> |  eve|   e1|   val2|8
> |  eve|   e1|   val3|9
> +-+-++-
>
>
>
> --
> Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
>
> -
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>
>