Re: Two spark applications listen on same port on same machine
I'm sure just the first one listen on port, but in master UI, both application redirects to same machie, same port. Just as I checked url, they redirects to application ui of first sumbitted one. So I think it could be only problem in UI. On Wed, Mar 6, 2019, 10:29 PM Sean Owen wrote: > Two drivers can't be listening on port 4040 at the same time -- on the > same machine. The OS wouldn't allow it. Are they actually on different > machines or somehow different interfaces? or are you saying the reported > port is wrong? > > On Wed, Mar 6, 2019 at 12:23 PM Moein Hosseini wrote: > >> I've submitted two spark applications in cluster of 3 standalone nodes in >> near the same time (I have bash script to submit them one after one without >> delay). But something goes wrong. In the master UI, Running applications >> section show both of my job with true configuration (cores, memory and >> different application-id) but both of redirect to port number 4040 which is >> listen by second submitted job. >> I think it could be race condition in UI but found nothing in logs. Could >> you help me to investigate where should I look for reason? >> >> Best Regards >> Moein >> >> -- >> >> Moein Hosseini >> Data Engineer >> mobile: +98 912 468 1859 <+98+912+468+1859> >> site: www.moein.xyz >> email: moein...@gmail.com >> [image: linkedin] <https://www.linkedin.com/in/moeinhm> >> [image: twitter] <https://twitter.com/moein7tl> >> >>
Two spark applications listen on same port on same machine
I've submitted two spark applications in cluster of 3 standalone nodes in near the same time (I have bash script to submit them one after one without delay). But something goes wrong. In the master UI, Running applications section show both of my job with true configuration (cores, memory and different application-id) but both of redirect to port number 4040 which is listen by second submitted job. I think it could be race condition in UI but found nothing in logs. Could you help me to investigate where should I look for reason? Best Regards Moein -- Moein Hosseini Data Engineer mobile: +98 912 468 1859 <+98+912+468+1859> site: www.moein.xyz email: moein...@gmail.com [image: linkedin] <https://www.linkedin.com/in/moeinhm> [image: twitter] <https://twitter.com/moein7tl>
A bug from 1.6.5 to 2.4.0
Today I face with bug on HA master with Zookeeper on 2.4.0 and found there is issue for it from 1.6.5 SPARK-15544 <https://issues.apache.org/jira/browse/SPARK-15544> but not assigned or PR about it. As it opened in 2016, please check it and consider it. Best regards -- Moein Hosseini Data Engineer mobile: +98 912 468 1859 <+98+912+468+1859> site: www.moein.xyz email: moein...@gmail.com [image: linkedin] <https://www.linkedin.com/in/moeinhm> [image: twitter] <https://twitter.com/moein7tl>
Re: [VOTE] [RESULT] SPIP: DataFrame-based Property Graphs, Cypher Queries, and Algorithms
++1 from me. On Wed, Feb 13, 2019 at 2:19 AM Xiangrui Meng wrote: > Hi all, > > The vote passed with the following +1s (* = binding) and no 0s/-1s: > > * Denny Lee > * Jules Damji > * Xiao Li* > * Dongjoon Hyun > * Mingjie Tang > * Yanbo Liang* > * Marco Gaido > * Joseph Bradley* > * Xiangrui Meng* > > Please watch SPARK-25994 and join future discussions there. Thanks! > > Best, > Xiangrui > -- Moein Hosseini Data Engineer mobile: +98 912 468 1859 <+98+912+468+1859> site: www.moein.xyz email: moein...@gmail.com [image: linkedin] <https://www.linkedin.com/in/moeinhm> [image: twitter] <https://twitter.com/moein7tl>
Re: Feature request: split dataset based on condition
I don't consider it as method to apply filtering multiple time, instead use it as semi-action not just transformation. Let's think that we have something like map-partition which accept multiple lambda that each one collect their ROW for their dataset (or something like it). Is it possible? On Sat, Feb 2, 2019 at 5:59 PM Sean Owen wrote: > I think the problem is that can't produce multiple Datasets from one > source in one operation - consider that reproducing one of them would mean > reproducing all of them. You can write a method that would do the filtering > multiple times but it wouldn't be faster. What do you have in mind that's > different? > > On Sat, Feb 2, 2019 at 12:19 AM Moein Hosseini wrote: > >> I've seen many application need to split dataset to multiple datasets >> based on some conditions. As there is no method to do it in one place, >> developers use *filter *method multiple times. I think it can be useful >> to have method to split dataset based on condition in one iteration, >> something like *partition* method of scala (of-course scala partition >> just split list into two list, but something more general can be more >> useful). >> If you think it can be helpful, I can create Jira issue and work on it to >> send PR. >> >> Best Regards >> Moein >> >> -- >> >> Moein Hosseini >> Data Engineer >> mobile: +98 912 468 1859 <+98+912+468+1859> >> site: www.moein.xyz >> email: moein...@gmail.com >> [image: linkedin] <https://www.linkedin.com/in/moeinhm> >> [image: twitter] <https://twitter.com/moein7tl> >> >> -- Moein Hosseini Data Engineer mobile: +98 912 468 1859 <+98+912+468+1859> site: www.moein.xyz email: moein...@gmail.com [image: linkedin] <https://www.linkedin.com/in/moeinhm> [image: twitter] <https://twitter.com/moein7tl>
Feature request: split dataset based on condition
I've seen many application need to split dataset to multiple datasets based on some conditions. As there is no method to do it in one place, developers use *filter *method multiple times. I think it can be useful to have method to split dataset based on condition in one iteration, something like *partition* method of scala (of-course scala partition just split list into two list, but something more general can be more useful). If you think it can be helpful, I can create Jira issue and work on it to send PR. Best Regards Moein -- Moein Hosseini Data Engineer mobile: +98 912 468 1859 <+98+912+468+1859> site: www.moein.xyz email: moein...@gmail.com [image: linkedin] <https://www.linkedin.com/in/moeinhm> [image: twitter] <https://twitter.com/moein7tl>
Why outdated third-parties exist on documentation?
Hi everyone, I was taking look at spark documentation about third-party projects <http://spark.apache.org/third-party-projects.html> and monitoring <http://spark.apache.org/docs/latest/monitoring.html> and realize that many of introduced projects is discontinued. For example BlickDB <https://github.com/sameeragarwal/blinkdb> has no commit over the last 5 years or ganglia <http://ganglia.info/> (monitoring tool) last release was in 2015. Is there any plan to use such old-school tools or we have remove them from documentation? -- Moein Hosseini Data Engineer mobile: +98 912 468 1859 <+98+912+468+1859> site: www.moein.xyz email: moein...@gmail.com [image: linkedin] <https://www.linkedin.com/in/moeinhm> [image: twitter] <https://twitter.com/moein7tl>