Re: welcome a new batch of committers

2018-10-05 Thread Bhupendra Mishra
Congratulations to all of you
Good Luck
Regards

On Wed, Oct 3, 2018 at 2:29 PM Reynold Xin  wrote:

> Hi all,
>
> The Apache Spark PMC has recently voted to add several new committers to
> the project, for their contributions:
>
> - Shane Knapp (contributor to infra)
> - Dongjoon Hyun (contributor to ORC support and other parts of Spark)
> - Kazuaki Ishizaki (contributor to Spark SQL)
> - Xingbo Jiang (contributor to Spark Core and SQL)
> - Yinan Li (contributor to Spark on Kubernetes)
> - Takeshi Yamamuro (contributor to Spark SQL)
>
> Please join me in welcoming them!
>
>


Re: Welcome Zhenhua Wang as a Spark committer

2018-04-03 Thread Bhupendra Mishra
Welcome and congratulation Zhenhua. Cheers

On Mon, Apr 2, 2018 at 10:58 AM, Wenchen Fan  wrote:

> Hi all,
>
> The Spark PMC recently added Zhenhua Wang as a committer on the project.
> Zhenhua is the major contributor of the CBO project, and has been
> contributing across several areas of Spark for a while, focusing especially
> on analyzer, optimizer in Spark SQL. Please join me in welcoming Zhenhua!
>
> Wenchen
>


:40: error: value join is not a member of Unit

2017-04-19 Thread Bhupendra Mishra
Hi All,
Need your help with subjected error

following code is executing and  getting error

cala> val join =
bat_first_won.join(total_matches_per_venue).map(x=>(x._1,(x._2._1*100/x._2._2))).map(item
=> item.swap).sortByKey(false).collect.foreach(println)
:40: error: value join is not a member of Unit
 val join =
bat_first_won.join(total_matches_per_venue).map(x=>(x._1,(x._2._1*100/x._2._2))).map(item
=> item.swap).sortByKey(false).collect.foreach(println)


Re: ImportError: No module named numpy

2016-06-17 Thread Bhupendra Mishra
Issue has been fixed after lots of R around finally found preety simple
things causing this problem

It was related to permission issue on the python libraries. The user I am
logged in was not having enough permission to read/execute the following
python liabraries.

 /usr/lib/python2.7/site-packages/
/usr/lib64/python2.7/

so above path should have read/execute permission to user executing
python/pyspark program.

Thanks everyone for your help with same. Appreciate!
Regards


On Sun, Jun 5, 2016 at 12:04 AM, Daniel Rodriguez <df.rodriguez...@gmail.com
> wrote:

> Like people have said you need numpy in all the nodes of the cluster. The
> easiest way in my opinion is to use anaconda:
> https://www.continuum.io/downloads but that can get tricky to manage in
> multiple nodes if you don't have some configuration management skills.
>
> How are you deploying the spark cluster? If you are using cloudera I
> recommend to use the Anaconda Parcel:
> http://blog.cloudera.com/blog/2016/02/making-python-on-apache-hadoop-easier-with-anaconda-and-cdh/
>
> On 4 Jun 2016, at 11:13, Gourav Sengupta <gourav.sengu...@gmail.com>
> wrote:
>
> Hi,
>
> I think that solution is too simple. Just download anaconda (if you pay
> for the licensed version you will eventually feel like being in heaven when
> you move to CI and CD and live in a world where you have a data product
> actually running in real life).
>
> Then start the pyspark program by including the following:
>
> PYSPARK_PYTHON=< installation>>/anaconda2/bin/python2.7 PATH=$PATH:< installation>>/anaconda/bin <>/pyspark
>
> :)
>
> In case you are using it in EMR the solution is a bit tricky. Just let me
> know in case you want any further help.
>
>
> Regards,
> Gourav Sengupta
>
>
>
>
>
> On Thu, Jun 2, 2016 at 7:59 PM, Eike von Seggern <
> eike.segg...@sevenval.com> wrote:
>
>> Hi,
>>
>> are you using Spark on one machine or many?
>>
>> If on many, are you sure numpy is correctly installed on all machines?
>>
>> To check that the environment is set-up correctly, you can try something
>> like
>>
>> import os
>> pythonpaths = sc.range(10).map(lambda i:
>> os.environ.get("PYTHONPATH")).collect()
>> print(pythonpaths)
>>
>> HTH
>>
>> Eike
>>
>> 2016-06-02 15:32 GMT+02:00 Bhupendra Mishra <bhupendra.mis...@gmail.com>:
>>
>>> did not resolved. :(
>>>
>>> On Thu, Jun 2, 2016 at 3:01 PM, Sergio Fernández <wik...@apache.org>
>>> wrote:
>>>
>>>>
>>>> On Thu, Jun 2, 2016 at 9:59 AM, Bhupendra Mishra <
>>>> bhupendra.mis...@gmail.com> wrote:
>>>>>
>>>>> and i have already exported environment variable in spark-env.sh as
>>>>> follows.. error still there  error: ImportError: No module named numpy
>>>>>
>>>>> export PYSPARK_PYTHON=/usr/bin/python
>>>>>
>>>>
>>>> According the documentation at
>>>> http://spark.apache.org/docs/latest/configuration.html#environment-variables
>>>> the PYSPARK_PYTHON environment variable is for poniting to the Python
>>>> interpreter binary.
>>>>
>>>> If you check the programming guide
>>>> https://spark.apache.org/docs/0.9.0/python-programming-guide.html#installing-and-configuring-pyspark
>>>> it says you need to add your custom path to PYTHONPATH (the script
>>>> automatically adds the bin/pyspark there).
>>>>
>>>> So typically in Linux you would need to add the following (assuming you
>>>> installed numpy there):
>>>>
>>>> export PYTHONPATH=$PYTHONPATH:/usr/lib/python2.7/dist-packages
>>>>
>>>> Hope that helps.
>>>>
>>>>
>>>>
>>>>
>>>>> On Thu, Jun 2, 2016 at 12:04 AM, Julio Antonio Soto de Vicente <
>>>>> ju...@esbet.es> wrote:
>>>>>
>>>>>> Try adding to spark-env.sh (renaming if you still have it with
>>>>>> .template at the end):
>>>>>>
>>>>>> PYSPARK_PYTHON=/path/to/your/bin/python
>>>>>>
>>>>>> Where your bin/python is your actual Python environment with Numpy
>>>>>> installed.
>>>>>>
>>>>>>
>>>>>> El 1 jun 2016, a las 20:16, Bhupendra Mishra <
>>>>>> bhupendra.mis...@gmail.com> escribió:
>>>>>>
>>>>>> I have numpy installed but where I should

Re: Welcoming Yanbo Liang as a committer

2016-06-03 Thread Bhupendra Mishra
congratulations Yanbo!


On Sat, Jun 4, 2016 at 9:08 AM, Dongjoon Hyun  wrote:

> Wow, Congratulations, Yanbo!
>
> Dongjoon.
>
> On Fri, Jun 3, 2016 at 8:22 PM, Xiao Li  wrote:
>
>> Congratulations, Yanbo!
>>
>> 2016-06-03 19:54 GMT-07:00 Nan Zhu :
>>
>>> Congratulations !
>>>
>>> --
>>> Nan Zhu
>>>
>>> On June 3, 2016 at 10:50:33 PM, Ted Yu (yuzhih...@gmail.com) wrote:
>>>
>>> Congratulations, Yanbo.
>>>
>>> On Fri, Jun 3, 2016 at 7:48 PM, Matei Zaharia 
>>> wrote:
>>>
 Hi all,

 The PMC recently voted to add Yanbo Liang as a committer. Yanbo has
 been a super active contributor in many areas of MLlib. Please join me in
 welcoming Yanbo!

 Matei
 -
 To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
 For additional commands, e-mail: dev-h...@spark.apache.org


>>>
>>
>


Re: ImportError: No module named numpy

2016-06-02 Thread Bhupendra Mishra
did not resolved. :(

On Thu, Jun 2, 2016 at 3:01 PM, Sergio Fernández <wik...@apache.org> wrote:

>
> On Thu, Jun 2, 2016 at 9:59 AM, Bhupendra Mishra <
> bhupendra.mis...@gmail.com> wrote:
>>
>> and i have already exported environment variable in spark-env.sh as
>> follows.. error still there  error: ImportError: No module named numpy
>>
>> export PYSPARK_PYTHON=/usr/bin/python
>>
>
> According the documentation at
> http://spark.apache.org/docs/latest/configuration.html#environment-variables
> the PYSPARK_PYTHON environment variable is for poniting to the Python
> interpreter binary.
>
> If you check the programming guide
> https://spark.apache.org/docs/0.9.0/python-programming-guide.html#installing-and-configuring-pyspark
> it says you need to add your custom path to PYTHONPATH (the script
> automatically adds the bin/pyspark there).
>
> So typically in Linux you would need to add the following (assuming you
> installed numpy there):
>
> export PYTHONPATH=$PYTHONPATH:/usr/lib/python2.7/dist-packages
>
> Hope that helps.
>
>
>
>
>> On Thu, Jun 2, 2016 at 12:04 AM, Julio Antonio Soto de Vicente <
>> ju...@esbet.es> wrote:
>>
>>> Try adding to spark-env.sh (renaming if you still have it with .template
>>> at the end):
>>>
>>> PYSPARK_PYTHON=/path/to/your/bin/python
>>>
>>> Where your bin/python is your actual Python environment with Numpy
>>> installed.
>>>
>>>
>>> El 1 jun 2016, a las 20:16, Bhupendra Mishra <bhupendra.mis...@gmail.com>
>>> escribió:
>>>
>>> I have numpy installed but where I should setup PYTHONPATH?
>>>
>>>
>>> On Wed, Jun 1, 2016 at 11:39 PM, Sergio Fernández <wik...@apache.org>
>>> wrote:
>>>
>>>> sudo pip install numpy
>>>>
>>>> On Wed, Jun 1, 2016 at 5:56 PM, Bhupendra Mishra <
>>>> bhupendra.mis...@gmail.com> wrote:
>>>>
>>>>> Thanks .
>>>>> How can this be resolved?
>>>>>
>>>>> On Wed, Jun 1, 2016 at 9:02 PM, Holden Karau <hol...@pigscanfly.ca>
>>>>> wrote:
>>>>>
>>>>>> Generally this means numpy isn't installed on the system or your
>>>>>> PYTHONPATH has somehow gotten pointed somewhere odd,
>>>>>>
>>>>>> On Wed, Jun 1, 2016 at 8:31 AM, Bhupendra Mishra <
>>>>>> bhupendra.mis...@gmail.com> wrote:
>>>>>>
>>>>>>> If any one please can help me with following error.
>>>>>>>
>>>>>>>  File
>>>>>>> "/opt/mapr/spark/spark-1.6.1/python/lib/pyspark.zip/pyspark/mllib/__init__.py",
>>>>>>> line 25, in 
>>>>>>>
>>>>>>> ImportError: No module named numpy
>>>>>>>
>>>>>>>
>>>>>>> Thanks in advance!
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Cell : 425-233-8271
>>>>>> Twitter: https://twitter.com/holdenkarau
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Sergio Fernández
>>>> Partner Technology Manager
>>>> Redlink GmbH
>>>> m: +43 6602747925
>>>> e: sergio.fernan...@redlink.co
>>>> w: http://redlink.co
>>>>
>>>
>>>
>>
>
>
> --
> Sergio Fernández
> Partner Technology Manager
> Redlink GmbH
> m: +43 6602747925
> e: sergio.fernan...@redlink.co
> w: http://redlink.co
>


Re: ImportError: No module named numpy

2016-06-02 Thread Bhupendra Mishra
its RHEL

and i have already exported environment variable in spark-env.sh as
follows.. error still there  error: ImportError: No module named numpy

export PYSPARK_PYTHON=/usr/bin/python

thanks

On Thu, Jun 2, 2016 at 12:04 AM, Julio Antonio Soto de Vicente <
ju...@esbet.es> wrote:

> Try adding to spark-env.sh (renaming if you still have it with .template
> at the end):
>
> PYSPARK_PYTHON=/path/to/your/bin/python
>
> Where your bin/python is your actual Python environment with Numpy
> installed.
>
>
> El 1 jun 2016, a las 20:16, Bhupendra Mishra <bhupendra.mis...@gmail.com>
> escribió:
>
> I have numpy installed but where I should setup PYTHONPATH?
>
>
> On Wed, Jun 1, 2016 at 11:39 PM, Sergio Fernández <wik...@apache.org>
> wrote:
>
>> sudo pip install numpy
>>
>> On Wed, Jun 1, 2016 at 5:56 PM, Bhupendra Mishra <
>> bhupendra.mis...@gmail.com> wrote:
>>
>>> Thanks .
>>> How can this be resolved?
>>>
>>> On Wed, Jun 1, 2016 at 9:02 PM, Holden Karau <hol...@pigscanfly.ca>
>>> wrote:
>>>
>>>> Generally this means numpy isn't installed on the system or your
>>>> PYTHONPATH has somehow gotten pointed somewhere odd,
>>>>
>>>> On Wed, Jun 1, 2016 at 8:31 AM, Bhupendra Mishra <
>>>> bhupendra.mis...@gmail.com> wrote:
>>>>
>>>>> If any one please can help me with following error.
>>>>>
>>>>>  File
>>>>> "/opt/mapr/spark/spark-1.6.1/python/lib/pyspark.zip/pyspark/mllib/__init__.py",
>>>>> line 25, in 
>>>>>
>>>>> ImportError: No module named numpy
>>>>>
>>>>>
>>>>> Thanks in advance!
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Cell : 425-233-8271
>>>> Twitter: https://twitter.com/holdenkarau
>>>>
>>>
>>>
>>
>>
>> --
>> Sergio Fernández
>> Partner Technology Manager
>> Redlink GmbH
>> m: +43 6602747925
>> e: sergio.fernan...@redlink.co
>> w: http://redlink.co
>>
>
>


Re: ImportError: No module named numpy

2016-06-01 Thread Bhupendra Mishra
I have numpy installed but where I should setup PYTHONPATH?


On Wed, Jun 1, 2016 at 11:39 PM, Sergio Fernández <wik...@apache.org> wrote:

> sudo pip install numpy
>
> On Wed, Jun 1, 2016 at 5:56 PM, Bhupendra Mishra <
> bhupendra.mis...@gmail.com> wrote:
>
>> Thanks .
>> How can this be resolved?
>>
>> On Wed, Jun 1, 2016 at 9:02 PM, Holden Karau <hol...@pigscanfly.ca>
>> wrote:
>>
>>> Generally this means numpy isn't installed on the system or your
>>> PYTHONPATH has somehow gotten pointed somewhere odd,
>>>
>>> On Wed, Jun 1, 2016 at 8:31 AM, Bhupendra Mishra <
>>> bhupendra.mis...@gmail.com> wrote:
>>>
>>>> If any one please can help me with following error.
>>>>
>>>>  File
>>>> "/opt/mapr/spark/spark-1.6.1/python/lib/pyspark.zip/pyspark/mllib/__init__.py",
>>>> line 25, in 
>>>>
>>>> ImportError: No module named numpy
>>>>
>>>>
>>>> Thanks in advance!
>>>>
>>>>
>>>
>>>
>>> --
>>> Cell : 425-233-8271
>>> Twitter: https://twitter.com/holdenkarau
>>>
>>
>>
>
>
> --
> Sergio Fernández
> Partner Technology Manager
> Redlink GmbH
> m: +43 6602747925
> e: sergio.fernan...@redlink.co
> w: http://redlink.co
>


Re: ImportError: No module named numpy

2016-06-01 Thread Bhupendra Mishra
Thanks .
How can this be resolved?

On Wed, Jun 1, 2016 at 9:02 PM, Holden Karau <hol...@pigscanfly.ca> wrote:

> Generally this means numpy isn't installed on the system or your
> PYTHONPATH has somehow gotten pointed somewhere odd,
>
> On Wed, Jun 1, 2016 at 8:31 AM, Bhupendra Mishra <
> bhupendra.mis...@gmail.com> wrote:
>
>> If any one please can help me with following error.
>>
>>  File
>> "/opt/mapr/spark/spark-1.6.1/python/lib/pyspark.zip/pyspark/mllib/__init__.py",
>> line 25, in 
>>
>> ImportError: No module named numpy
>>
>>
>> Thanks in advance!
>>
>>
>
>
> --
> Cell : 425-233-8271
> Twitter: https://twitter.com/holdenkarau
>


ImportError: No module named numpy

2016-06-01 Thread Bhupendra Mishra
If any one please can help me with following error.

 File
"/opt/mapr/spark/spark-1.6.1/python/lib/pyspark.zip/pyspark/mllib/__init__.py",
line 25, in 

ImportError: No module named numpy


Thanks in advance!


Re: Welcoming two new committers

2016-02-08 Thread Bhupendra Mishra
Congratulations to both. and welcome to group.

On Mon, Feb 8, 2016 at 10:45 PM, Matei Zaharia 
wrote:

> Hi all,
>
> The PMC has recently added two new Spark committers -- Herman van Hovell
> and Wenchen Fan. Both have been heavily involved in Spark SQL and Tungsten,
> adding new features, optimizations and APIs. Please join me in welcoming
> Herman and Wenchen.
>
> Matei
> -
> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
> For additional commands, e-mail: dev-h...@spark.apache.org
>
>


Re: [VOTE] Release Apache Spark 1.6.0 (RC4)

2015-12-25 Thread Bhupendra Mishra
+1

On Fri, Dec 25, 2015 at 8:31 PM, vaquar khan  wrote:

> +1
> On 24 Dec 2015 22:01, "Vinay Shukla"  wrote:
>
>> +1
>> Tested on HDP 2.3, YARN cluster mode, spark-shell
>>
>> On Wed, Dec 23, 2015 at 6:14 AM, Allen Zhang 
>> wrote:
>>
>>>
>>> +1 (non-binding)
>>>
>>> I have just tarball a new binary and tested am.nodelabelexpression and
>>> executor.nodelabelexpression manully, result is expected.
>>>
>>>
>>>
>>>
>>> At 2015-12-23 21:44:08, "Iulian Dragoș" 
>>> wrote:
>>>
>>> +1 (non-binding)
>>>
>>> Tested Mesos deployments (client and cluster-mode, fine-grained and
>>> coarse-grained). Things look good
>>> .
>>>
>>> iulian
>>>
>>> On Wed, Dec 23, 2015 at 2:35 PM, Sean Owen  wrote:
>>>
 Docker integration tests still fail for Mark and I, and should
 probably be disabled:
 https://issues.apache.org/jira/browse/SPARK-12426

 ... but if anyone else successfully runs these (and I assume Jenkins
 does) then not a blocker.

 I'm having intermittent trouble with other tests passing, but nothing
 unusual.
 Sigs and hashes are OK.

 We have 30 issues fixed for 1.6.1. All but those resolved in the last
 24 hours or so should be fixed for 1.6.0 right? I can touch that up.





 On Tue, Dec 22, 2015 at 8:10 PM, Michael Armbrust
  wrote:
 > Please vote on releasing the following candidate as Apache Spark
 version
 > 1.6.0!
 >
 > The vote is open until Friday, December 25, 2015 at 18:00 UTC and
 passes if
 > a majority of at least 3 +1 PMC votes are cast.
 >
 > [ ] +1 Release this package as Apache Spark 1.6.0
 > [ ] -1 Do not release this package because ...
 >
 > To learn more about Apache Spark, please see http://spark.apache.org/
 >
 > The tag to be voted on is v1.6.0-rc4
 > (4062cda3087ae42c6c3cb24508fc1d3a931accdf)
 >
 > The release files, including signatures, digests, etc. can be found
 at:
 >
 http://people.apache.org/~pwendell/spark-releases/spark-1.6.0-rc4-bin/
 >
 > Release artifacts are signed with the following key:
 > https://people.apache.org/keys/committer/pwendell.asc
 >
 > The staging repository for this release can be found at:
 >
 https://repository.apache.org/content/repositories/orgapachespark-1176/
 >
 > The test repository (versioned as v1.6.0-rc4) for this release can be
 found
 > at:
 >
 https://repository.apache.org/content/repositories/orgapachespark-1175/
 >
 > The documentation corresponding to this release can be found at:
 >
 http://people.apache.org/~pwendell/spark-releases/spark-1.6.0-rc4-docs/
 >
 > ===
 > == How can I help test this release? ==
 > ===
 > If you are a Spark user, you can help us test this release by taking
 an
 > existing Spark workload and running on this release candidate, then
 > reporting any regressions.
 >
 > 
 > == What justifies a -1 vote for this release? ==
 > 
 > This vote is happening towards the end of the 1.6 QA period, so -1
 votes
 > should only occur for significant regressions from 1.5. Bugs already
 present
 > in 1.5, minor regressions, or bugs related to new features will not
 block
 > this release.
 >
 > ===
 > == What should happen to JIRA tickets still targeting 1.6.0? ==
 > ===
 > 1. It is OK for documentation patches to target 1.6.0 and still go
 into
 > branch-1.6, since documentations will be published separately from the
 > release.
 > 2. New features for non-alpha-modules should target 1.7+.
 > 3. Non-blocker bug fixes should target 1.6.1 or 1.7.0, or drop the
 target
 > version.
 >
 >
 > ==
 > == Major changes to help you focus your testing ==
 > ==
 >
 > Notable changes since 1.6 RC3
 >
 >
 >   - SPARK-12404 - Fix serialization error for Datasets with
 > Timestamps/Arrays/Decimal
 >   - SPARK-12218 - Fix incorrect pushdown of filters to parquet
 >   - SPARK-12395 - Fix join columns of outer join for DataFrame using
 >   - SPARK-12413 - Fix mesos HA
 >
 >
 > Notable changes since 1.6 RC2
 >
 >
 > - SPARK_VERSION has been set correctly
 > - SPARK-12199 ML Docs are publishing correctly
 > - SPARK-12345 Mesos cluster mode