This is an automated email from the ASF dual-hosted git repository. myui pushed a commit to branch asf-site in repository https://gitbox.apache.org/repos/asf/incubator-hivemall-site.git
commit eb4c16ed01465b18176f43a018b4fdf07b7015a8 Author: Makoto Yui <m...@apache.org> AuthorDate: Sat Jun 29 01:56:26 2019 +0900 Added a usage of feature_binning UDF --- userguide/ft_engineering/binning.html | 70 ++++++++++++++++++++++++++++++----- 1 file changed, 61 insertions(+), 9 deletions(-) diff --git a/userguide/ft_engineering/binning.html b/userguide/ft_engineering/binning.html index 1d4f235..c0102b3 100644 --- a/userguide/ft_engineering/binning.html +++ b/userguide/ft_engineering/binning.html @@ -2382,10 +2382,11 @@ <!-- toc --><div id="toc" class="toc"> <ul> -<li><a href="#usage">Usage</a><ul> -<li><a href="#feature-vector-trasformation-by-applying-feature-binning">Feature Vector trasformation by applying Feature Binning</a></li> +<li><a href="#data-preparation">Data Preparation</a><ul> +<li><a href="#custom-rule-for-binning">Custom rule for binning</a></li> +<li><a href="#binning-based-on-quantiles">Binning based on quantiles</a></li> <li><a href="#practical-example">Practical Example</a></li> -<li><a href="#get-a-mapping-table-by-feature-binning">Get a mapping table by Feature Binning</a></li> +<li><a href="#create-a-mapping-table-by-feature-binning">Create a mapping table by Feature Binning</a></li> </ul> </li> <li><a href="#function-signatures">Function Signatures</a><ul> @@ -2397,7 +2398,7 @@ </ul> </div><!-- tocstop --> -<h1 id="usage">Usage</h1> +<h1 id="data-preparation">Data Preparation</h1> <p>Prepare sample data (<em>users</em> table) first as follows:</p> <pre><code class="lang-sql"><span class="hljs-keyword">CREATE</span> <span class="hljs-keyword">TABLE</span> <span class="hljs-keyword">users</span> ( <span class="hljs-keyword">rowid</span> <span class="hljs-built_in">int</span>, <span class="hljs-keyword">name</span> <span class="hljs-keyword">string</span>, age <span class="hljs-built_in">int</span>, gender <span class="hljs-keyword">string</span> @@ -2448,8 +2449,59 @@ </tr> </tbody> </table> -<h2 id="feature-vector-trasformation-by-applying-feature-binning">Feature Vector trasformation by applying Feature Binning</h2> -<p>Now, converting <code>age</code> values into 3 bins.</p> +<h2 id="custom-rule-for-binning">Custom rule for binning</h2> +<p>You can provide a custom rule for binning as follows:</p> +<pre><code class="lang-sql"><span class="hljs-keyword">select</span> + features <span class="hljs-keyword">as</span> original, + feature_binning( + features, + <span class="hljs-comment">-- [-INF-10.0], (10.0-20.0], (20.0-30.0], (30.0-40.0], (40.0-INF]</span> + <span class="hljs-keyword">map</span>(<span class="hljs-string">'age'</span>, <span class="hljs-built_in">array</span>(-infinity(), <span class="hljs-number">10.0</span>, <span class="hljs-number">20.0</span>, <span class="hljs-number">30.0</span>, <span class="hljs-number">40.0</span>, infinity())) + ) <span class="hljs-keyword">as</span> binned +<span class="hljs-keyword">from</span> + <span class="hljs-keyword">input</span>; +</code></pre> +<table> +<thead> +<tr> +<th style="text-align:left">original</th> +<th style="text-align:left">binned</th> +</tr> +</thead> +<tbody> +<tr> +<td style="text-align:left">["name#Jacob","gender#Male","age:20.0"]</td> +<td style="text-align:left">["name#Jacob","gender#Male","age:1"]</td> +</tr> +<tr> +<td style="text-align:left">["name#Mason","gender#Male","age:22.0"]</td> +<td style="text-align:left">["name#Mason","gender#Male","age:2"]</td> +</tr> +<tr> +<td style="text-align:left">["name#Sophia","gender#Female","age:35.0"]</td> +<td style="text-align:left">["name#Sophia","gender#Female","age:3"]</td> +</tr> +<tr> +<td style="text-align:left">["name#Ethan","gender#Male","age:55.0"]</td> +<td style="text-align:left">["name#Ethan","gender#Male","age:4"]</td> +</tr> +<tr> +<td style="text-align:left">["name#Emma","gender#Female","age:15.0"]</td> +<td style="text-align:left">["name#Emma","gender#Female","age:1"]</td> +</tr> +<tr> +<td style="text-align:left">["name#Noah","gender#Male","age:46.0"]</td> +<td style="text-align:left">["name#Noah","gender#Male","age:4"]</td> +</tr> +<tr> +<td style="text-align:left">["name#Isabella","gender#Female","age:20.0"]</td> +<td style="text-align:left">["name#Isabella","gender#Female","age:1"]</td> +</tr> +</tbody> +</table> +<h2 id="binning-based-on-quantiles">Binning based on quantiles</h2> +<p>You can apply feature binning based on <a href="https://en.wikipedia.org/wiki/Quantile" target="_blank">quantiles</a>. </p> +<p>Suppose converting <code>age</code> values into 3 bins:</p> <pre><code class="lang-sql"><span class="hljs-keyword">SELECT</span> <span class="hljs-keyword">map</span>(<span class="hljs-string">'age'</span>, build_bins(age, <span class="hljs-number">3</span>)) <span class="hljs-keyword">AS</span> quantiles_map <span class="hljs-keyword">FROM</span> @@ -2458,7 +2510,7 @@ <blockquote> <p>{"age":[-Infinity,18.333333333333332,30.666666666666657,Infinity]}</p> </blockquote> -<p>In the above query result, you can find 4 values for age in <code>quantiles_map</code>. It's a threshold of 3 bins. </p> +<p>In the above query result, you can find 4 values for age in <code>quantiles_map</code>. It's a threshold for 3 bins.</p> <pre><code class="lang-sql">WITH bins as ( <span class="hljs-keyword">SELECT</span> <span class="hljs-keyword">map</span>(<span class="hljs-string">'age'</span>, build_bins(age, <span class="hljs-number">3</span>)) <span class="hljs-keyword">AS</span> quantiles_map @@ -2582,7 +2634,7 @@ bins <span class="hljs-keyword">as</span> ( </tr> </tbody> </table> -<h2 id="get-a-mapping-table-by-feature-binning">Get a mapping table by Feature Binning</h2> +<h2 id="create-a-mapping-table-by-feature-binning">Create a mapping table by Feature Binning</h2> <pre><code class="lang-sql">WITH bins AS ( <span class="hljs-keyword">SELECT</span> build_bins(age, <span class="hljs-number">3</span>) <span class="hljs-keyword">AS</span> quantiles <span class="hljs-keyword">FROM</span> <span class="hljs-keyword">users</span> @@ -2777,7 +2829,7 @@ Apache Hivemall is an effort undergoing incubation at The Apache Software Founda <script> var gitbook = gitbook || []; gitbook.push(function() { - gitbook.page.hasChanged({"page":{"title":"Feature Binning","level":"3.4","depth":1,"next":{"title":"Feature Paring","level":"3.5","depth":1,"path":"ft_engineering/pairing.md","ref":"ft_engineering/pairing.md","articles":[{"title":"Polynomial features","level":"3.5.1","depth":2,"path":"ft_engineering/polynomial.md","ref":"ft_engineering/polynomial.md","articles":[]}]},"previous":{"title":"Feature Selection","level":"3.3","depth":1,"path":"ft_engineering/selection.md","ref":"ft [...] + gitbook.page.hasChanged({"page":{"title":"Feature Binning","level":"3.4","depth":1,"next":{"title":"Feature Paring","level":"3.5","depth":1,"path":"ft_engineering/pairing.md","ref":"ft_engineering/pairing.md","articles":[{"title":"Polynomial features","level":"3.5.1","depth":2,"path":"ft_engineering/polynomial.md","ref":"ft_engineering/polynomial.md","articles":[]}]},"previous":{"title":"Feature Selection","level":"3.3","depth":1,"path":"ft_engineering/selection.md","ref":"ft [...] }); </script> </div>