[ https://issues.apache.org/jira/browse/PIG-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Daniel Dai updated PIG-2793: ---------------------------- Component/s: (was: parser) (was: grunt) (was: tools) > Make Pig Work on Windows without Cygwin > --------------------------------------- > > Key: PIG-2793 > URL: https://issues.apache.org/jira/browse/PIG-2793 > Project: Pig > Issue Type: Bug > Environment: Windows without Cygwin as a whole, but with some key > binaries such as perl, diff, gawk, gzip, sed. > Reporter: John Gordon > > For pig to really work well on Windows, it needs hadoop core changes. Right > now, those are in progress in branch-1-win. For this work, I am running Pig > on Windows against branch-1-win and removing Cygwin dependencies as > capabilities open up. Branch-1-win is fairly stable now, and has opened up > enough functionality to see the few things needed in Pig to run E2E on top of > a cross-platform Hadoop core without Cygwin. This uber-JIRA should track the > whole of the work to get pig running well on Windows without Cygwin. > There are a few types of work that I think are needed right now (will > break-out sub-jiras to track them): > TEST: > -------- > 1.) Tests that generate pig script strings with paths in them (e.g. > dynamically build load/store commands) need to have Pig escape ("\") > characters encoded -- as they can now occur in both Hadoop and local paths. > 2.) Tests that generate local temporary files with createTempFile, and then > try to use those as HDFS paths need to remove ":" from the generated file > name to create valid HDFS paths. > 3.) Tests that hand-generate URIs via string concatenation (e.g. "file:" + > strFileName) need to use Util.generateURI instead to get a valid URI for the > target platform. > 4.) Tests that assume the first line in a script (e.g. #!/bin/sh) > auto-resolves interpreters need to explicitly call the interpreter (e.g. > instead of calling "perlscript.pl" they should call "perl perlscript.pl". > 5.) Changes in quotes or command syntax between shells (e.g. " or ', dir or > ls) need to be tuned a little here and there. > PROD: > -------- > 1.) The streaming interface needs to be fixed to run without a Cygwin > dependency. > 2.) The pig.additional.jars separator is currently hardcoded to ":", and > should be File.pathSeparator instead (":" on linux, ";" on Windows) to be > able to accept Windows paths (C:\file.jar for instance). > 3.) The Grunt "sh" command highly surfaces the behavior of the exec API. If > you use a built-in, it fails with file not found. This surfaces a lot of > differences in shell implementation differences (e.g. ls is an exe, but dir > is builtin) -- and many of the cases in TestGrunt end up running (sh bash -c > "command"). For portability and ease of use, sh should actually exec "sh -c > <command> on Linux and "cmd /C <command>" on Windows to improve usability and > make it possible to use aliases and bat files on either platform to make the > interface more platform independent to end-users. > 4.) (eventual) Update Pig's dependencies to pick up a stable Hadoop core that > runs on Windows from a release branch. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira