Re: please care and vote for Chinese people under cruel autocracy of CCP, great thanks!

2019-08-29 Thread Yan Facai
Please do not send spam email. Thanks.

On Thu, 29 Aug 2019, 13:05 ant_fighter, 
wrote:

> Hi all,
> Sorry for disturbing you guys. Though I don't think here as a proper place
> to do this, I need your help, your vote, your holy vote, for us Chinese,
> for conscience and justice, for better world.
>
> In the over 70 years of ruling over China, the Chinese Communist Party has
> done many horrible things humans can think of. These malicious and evil
> deeds include but are not limited to: falsifying national history,
> suppression of freedom of speech and press, money laundering in the scale
> of trillions, live organ harvesting, sexual harassment and assault to
> underaged females, slaughtering innocent citizens with
> counter-revolutionary excuses, etc.
>
> In light of the recent violent actions to Hong Kongers by the People's
> Liberation Army (PLA) disguised as Hong Kong Police Force, we the people
> petition to officially recognize the Chinese Communist Party as a terrorist
> organization.
> PLEASE SIGNUP and VOTE for us:
>
> https://petitions.whitehouse.gov/petition/call-official-recognition-chinese-communist-party-terrorist-organization
>
> Thanks again for all!
>
> nameless, an ant fighter
> 2019.8.29
>


Re: [ML] Migrating transformers from mllib to ml

2017-11-07 Thread Yan Facai
Hi, I have migrated HashingTF from mllib to ml, and wait for review.

see:
[SPARK-21748][ML] Migrate the implementation of HashingTF from MLlib to ML
#18998
https://github.com/apache/spark/pull/18998



On Mon, Nov 6, 2017 at 10:58 PM, Marco Gaido  wrote:

> Hello,
>
> I saw that there are several TODOs to migrate some transformers (like
> HashingTF and IDF) to use only ml.Vector in order to avoid the overhead of
> converting them to the mllib ones and back.
>
> Is there any reason why this has not been done so far? Is it to avoid code
> duplication? If so, is it still an issue since we are going to deprecate
> mllib from 2.3 (at least this is what I read on Spark docs)? If no, I can
> work on this.
>
> Thanks,
> Marco
>
>
>


Re: LibSVM should have just one input file

2017-06-11 Thread Yan Facai
Hi, yaphet.
It seems that the code you pasted should be located in  LibSVM,  rather
than SVM.
Do I misunderstand?

For LibSVMDataSource,
1. if numFeatures is unspecified, only one file is valid input.

val df = spark.read.format("libsvm")
  .load("data/mllib/sample_libsvm_data.txt")

2. otherwise, multiple files are OK.

val df = spark.read.format("libsvm")
  .option("numFeatures", "780")
  .load("data/mllib/sample_libsvm_data.txt")


For more to see: http://spark.apache.org/docs/latest/api/scala/index.html#
org.apache.spark.ml.source.libsvm.LibSVMDataSource


On Mon, Jun 12, 2017 at 11:46 AM, darion.yaphet  wrote:

> Hi team :
>
> Currently when we using SVM to train dataset we found the input
> files limit only one .
>
> the source code as following :
>
> val path = if (dataFiles.length == 1) {
> dataFiles.head.getPath.toUri.toString
> } else if (dataFiles.isEmpty) {
> throw new IOException("No input path specified for libsvm data")
> } else {
> throw new IOException("Multiple input paths are not supported for libsvm
> data.")
> }
>
> The file store on the Distributed File System such as HDFS is split into
> mutil piece and I think this limit is not necessary . I'm not sure is it a
> bug ? or something I'm using not correctly .
>
> thanks a lot ~~~
>
>
>
>


Re: Starter tasks to start contributing

2017-05-22 Thread Yan Facai
Hi,
I think that starter label is useful for you.

How about this link:

https://issues.apache.org/jira/browse/SPARK-5?jql=
project%20=%20SPARK%20%20AND%20component%20in%20%20("Spark%
20Core",%20%20"Structured%20Streaming")%20AND%20status%
20=%20Open%20AND%20labels%20=%20starter%20ORDER%20BY%20priority%20DESC



On Wed, May 17, 2017 at 4:29 PM, vys6fudl  wrote:

> Hi!
>
> I would like to contribute to Spark since I use it at work. Are there some
> starter tasks related to Spark Core or Spark Streaming that I could work
> on?
> I couldn't find the right search in JIRA, so if someone could even point me
> to that if there is already stuff tagged there, then that would be useful
> as
> well. The Contributing to Spark page mentioned JIRA starter tasks, but I
> couldn't find any.
>
> Thanks!
>
>
>
>
>
> --
> View this message in context: http://apache-spark-developers
> -list.1001551.n3.nabble.com/Starter-tasks-to-start-
> contributing-tp21570.html
> Sent from the Apache Spark Developers List mailing list archive at
> Nabble.com.
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>


Re: how to retain part of the features in LogisticRegressionModel (spark2.0)

2017-03-20 Thread Yan Facai
Hi, jinhong.
Do you use `setRegParam`, which is 0.0 by default ?


Both elasticNetParam and regParam are required if regularization is need.

val regParamL1 = $(elasticNetParam) * $(regParam)
val regParamL2 = (1.0 - $(elasticNetParam)) * $(regParam)




On Mon, Mar 20, 2017 at 6:31 PM, Yanbo Liang  wrote:

> Do you want to get sparse model that most of the coefficients are zeros?
> If yes, using L1 regularization leads to sparsity. But the
> LogisticRegressionModel coefficients vector's size is still equal with the
> number of features, you can get the non-zero elements manually. Actually,
> it would be a sparse vector (or matrix for multinomial case) if it's sparse
> enough.
>
> Thanks
> Yanbo
>
> On Sun, Mar 19, 2017 at 5:02 AM, Dhanesh Padmanabhan <
> dhanesh12...@gmail.com> wrote:
>
>> It shouldn't be difficult to convert the coefficients to a sparse vector.
>> Not sure if that is what you are looking for
>>
>> -Dhanesh
>>
>> On Sun, Mar 19, 2017 at 5:02 PM jinhong lu  wrote:
>>
>> Thanks Dhanesh,  and how about the features question?
>>
>> 在 2017年3月19日,19:08,Dhanesh Padmanabhan  写道:
>>
>> Dhanesh
>>
>>
>> Thanks,
>> lujinhong
>>
>> --
>> Dhanesh
>> +91-9741125245 <+91%2097411%2025245>
>>
>
>