Re: Platform dependent code placement (was: Re: repo layout again)

2006-03-10 Thread Mark Hindess
On 3/9/06, Oliver Deakin [EMAIL PROTECTED] wrote:
 Time to resurrect this thread again :)

We'll have to try to kill it properly this time. ;-)

 With the work that Mark and I have been doing in HARMONY-183/155/144/171
 we will be at a point soon where all the shared code has been taken out
 of the native-src/win.IA32 and native-src/linux.IA32 directories and
 combined into native-src/shared. Once completed we will be in a good
 position to reorganise the code into whatever layout we choose, and
 refactor the makefiles/scripts to use gmake/ant across both platforms. I
 dont think previous posts on this thread really reached a conclusion, so
 Ill reiterate the previous suggestions:

 1) Hierarchy of source - two suggestions put forward so far:
 - Keep architecture and OS names solely confined to directory names.
 So, for example, we could have:
src\main\native\
   shared\
   unix\
   windows\
   windows_x86\
   solaris_x86\
   All windows_x86 specific code will be contained under that
 directory, any generic windows code will be under windows\, and code
 common to all
   platforms will be under shared\ (or whatever name).
   So when looking for a source/header file on, for example, windows
 x86 the compiler would first look in windows_x86, then windows, then common.

 - Alternatively, have directory names as above, but also allow the
 OS and arch to be mixed into file names. To quote Andreys previous mail [1]:
   Files in the source tree are selected for compilation based on
 the OS or ARCH attribute values which may (or may not appear) in a file
 or directory name.
Some examples are:
  src\main\native\solaris\foo.cpp
  means file is applicable for whatever system running Solaris;

 src\main\native\win\foo_ia32.cpp
 file is applicable only for  Windows / IA32;

 src\main\native\foo_ia32_em64t.cpp
 file can be compiled for whatever OS on either IA32 or EM64T
 architecture, but nothing else.
   Files will be selected using a regex expression involving the OS
 and arch descriptors. This is intended to cut down duplication between
 source directories.

Wont some modules have another level after native?  Since there are
currently more sub-directories in native-src/linux.iA32 and
native-src/win.IA32 than there are modules?

 Personally I prefer the first system as it is simple to maintain, keeps
 file names consistent and concise and allows developers to easily keep
 track of function location.
 For example, as Graeme pointed out in [2], the developer will always
 know that hyfile_open() is defined in hyfile.c.

 In addition, I don't believe that the second system will give us much of
 a decrease in the number of duplicated files. For example, if a piece of
 code is unique to only linux
 and windows on x86, will the file be named foo_linux_windows_x86.c? How
 will the build scripts be able to determine whether this means all linux
 platforms plus
 windows_x86 or windows and linux only on x86? In these cases we will
 either end up duplicating foo_x86.c in the windows and linux directories
 or creating an extra directory
 called x86 which contains foo_windows_linux.c. Potentially we will
 either get similar amounts of duplication, or more directories than the
 first method, and because there
 is no hard rule on the layout (you can choose directory or filenames to
 include OS/arch) there is no guarantee where a developer will choose to
 put their code in these situations.

I don't think we should worry so much.  I think we should simply make
it as complicated as it needs to be for what we have today and let it
evolve when a clear requirement to change comes along.  That means for
today, we might just have:

  linux
  windows
  shared

We shouldn't even split by arch until we know we have too - most of
the current code should be usable on most architectures without
changes or at least easily fixable without duplicating entire files. 
(Thread .asm/.s files being an exception.)

We can decide what to do when something concrete comes up.  If nothing
else it is much easier to reason about a concrete example, than trying
to beat an issue to death when we are all probably envisioning
different future situations.

 2) Build tools - there have been two previous suggestions:
 - Use gmake and VPATH to complement the first layout described
 above. This could lead to platform independent makefiles stored in the
 shared\ directory of each module
   that include platform specifics (such as build file lists,
 compiler flags etc) from a centralised set of resources.

 - Alternatively, use Ant to select the set of files to be compiled
 by employing regex expressions. This sits well with the second layout
 described above 

Re: Platform dependent code placement (was: Re: repo layout again)

2006-03-09 Thread Oliver Deakin

Time to resurrect this thread again :)

With the work that Mark and I have been doing in HARMONY-183/155/144/171 
we will be at a point soon where all the shared code has been taken out 
of the native-src/win.IA32 and native-src/linux.IA32 directories and 
combined into native-src/shared. Once completed we will be in a good 
position to reorganise the code into whatever layout we choose, and 
refactor the makefiles/scripts to use gmake/ant across both platforms. I 
dont think previous posts on this thread really reached a conclusion, so 
Ill reiterate the previous suggestions:


1) Hierarchy of source - two suggestions put forward so far:
   - Keep architecture and OS names solely confined to directory names. 
So, for example, we could have:

  src\main\native\
 shared\
 unix\
 windows\
 windows_x86\
 solaris_x86\
 All windows_x86 specific code will be contained under that 
directory, any generic windows code will be under windows\, and code 
common to all

 platforms will be under shared\ (or whatever name).
 So when looking for a source/header file on, for example, windows 
x86 the compiler would first look in windows_x86, then windows, then common.


   - Alternatively, have directory names as above, but also allow the 
OS and arch to be mixed into file names. To quote Andreys previous mail [1]:
 Files in the source tree are selected for compilation based on 
the OS or ARCH attribute values which may (or may not appear) in a file 
or directory name.

  Some examples are:
src\main\native\solaris\foo.cpp
means file is applicable for whatever system running Solaris;

   src\main\native\win\foo_ia32.cpp
   file is applicable only for  Windows / IA32;

   src\main\native\foo_ia32_em64t.cpp
   file can be compiled for whatever OS on either IA32 or EM64T 
architecture, but nothing else.
 Files will be selected using a regex expression involving the OS 
and arch descriptors. This is intended to cut down duplication between 
source directories.


Personally I prefer the first system as it is simple to maintain, keeps 
file names consistent and concise and allows developers to easily keep 
track of function location.
For example, as Graeme pointed out in [2], the developer will always 
know that hyfile_open() is defined in hyfile.c.


In addition, I don't believe that the second system will give us much of 
a decrease in the number of duplicated files. For example, if a piece of 
code is unique to only linux
and windows on x86, will the file be named foo_linux_windows_x86.c? How 
will the build scripts be able to determine whether this means all linux 
platforms plus
windows_x86 or windows and linux only on x86? In these cases we will 
either end up duplicating foo_x86.c in the windows and linux directories 
or creating an extra directory
called x86 which contains foo_windows_linux.c. Potentially we will 
either get similar amounts of duplication, or more directories than the 
first method, and because there
is no hard rule on the layout (you can choose directory or filenames to 
include OS/arch) there is no guarantee where a developer will choose to 
put their code in these situations.



2) Build tools - there have been two previous suggestions:
   - Use gmake and VPATH to complement the first layout described 
above. This could lead to platform independent makefiles stored in the 
shared\ directory of each module
 that include platform specifics (such as build file lists, 
compiler flags etc) from a centralised set of resources.


   - Alternatively, use Ant to select the set of files to be compiled 
by employing regex expressions. This sits well with the second layout 
described above (although could also
 be applied to the first) and a regex expression has been described 
by Nikolay in [3].


I prefer the use of gmake here. We can use generic makefiles across 
platforms and pointing the compiler at the right files in the first 
layout above is as easy as setting VPATH to, for example,
windows_x86:windows:shared. I think that complex regex expressions will 
be harder to maintain (and initially understand!).



Opinions? Once we agree on ideas, perhaps we could put together a 
Wiki/website(?) page describing layout, tools and a list of OS/arch 
names to use.


Oliver Deakin
IBM United Kingdom Limited

[1] 
http://mail-archives.apache.org/mod_mbox/incubator-harmony-dev/200602.mbox/[EMAIL PROTECTED]
[2] 
http://mail-archives.apache.org/mod_mbox/incubator-harmony-dev/200602.mbox/[EMAIL PROTECTED]
[3] 
http://mail-archives.apache.org/mod_mbox/incubator-harmony-dev/200602.mbox/[EMAIL PROTECTED]




Re: Platform dependent code placement (was: Re: repo layout again)

2006-02-28 Thread Graeme Johnson
Mark Hindess [EMAIL PROTECTED] wrote on 02/23/2006 04:06:16 
AM:
snip
  I'd suggest that file is considered platform dependent if it contains
  any of magic platform keywords (like ia32, linux, e.t.c.) in it's
  full name. Directory name may or may not contain a leading name. For
  example, file */linux/*.c should be considered as linux specific as
  well. Another example, file */*_linux_solaris_*/*.c is considered as
  shared between linux and solaris, but not applicable for win, e.t.c.
 
 I have a few concerns about this plan.
 
 First, that we'll end up renaming relatively simple file names like
 foo_linux_solaris.c to unmanageable things like
 foo_linux_solaris_aix_plan9_freebsd_osx_openbsd_ecos.c.
snip

Andrey's earlier statement about allowing the component to choose the 
names for specializations sounds exactly right.  If you're developing the 
JIT you would want to split along processor lines (e.g. /ia32 /ppc) 
whereas the file-system interface will likely follow operating-system 
lines (e.g. /win32, /linux, /posix).

I'm not convinced about embedding the axis of specialization (OS/ARCH)
in filenames.  It seems like a every new component that comes along could 
demand a token in the filename.

Based on our experience with J9 we've also seen real value in keeping file 

names consistent (e.g function foo() lives in file bar.c).  This helps 
developers form a mental map of where a given piece of functionality 
resides, and ultimately makes navigating a large codebase easier.

For example, if a function hyfile_open() is always defined in the hyfile.c 

then your task is simply navigating to the correct version of hyfile.c in 
the directory tree.  If you play tricks with the filename by appending 
suffixes, you become more dependent on external tools like grep or ctags 
to locate the right file. 

My vote is for consistent file names, in directories whose names are
selected by the component owner.  A list of 'blessed' OS and ARCH values
would go a long way to helping component owners select the right directory
name.

snip 
 Second, if we define everything in terms of high level concepts, such
 as os and arch, then we will be losing the important information about
 the real distinction.  This would make it harder for new developers to
 understand the choices and reasoning embodied in the code.  To avoid
 this, we should really be defining (and using) the actual concept that
 is important in making the distinction about which code to pick up.
snip

Regardless of where you start defining configuration flags (OS and ARCH
seem like a good start to me) a few simple techniques can make your life
easier:

   i) Choose names that are unlikely to conflict with system headers by 
  adopting a suitable prefix. For example if you like: HY_ARCH or 
  HY_OS, your code would read like:

#define HY_ARCH_IA32   1
#define HY_OS_LINUX1/* this flag is turned on */
#define HY_OS_WIN320/* this flag is turned off */

  ii) Produce a list of the blessed names, and the pattern for declaring 
  new ones.  This helps anyone new to the project understand what
  exists already.
 
  iIi) Consider using #define with values (either one or zero) so that 
  you can use #if tests in the code.  We've found this is slightly 
  cleaner than the #ifdef(FLAGX) or defined(FLAGX) flavours. 
  For example: 

#if HY_ARCH_IA32  HY_OS_LINUX
/* do something */
#else
/* do something else */
#endif

my $0.02

Graeme Johnson
J9 VM Team, IBM Canada.

Re: Platform dependent code placement (was: Re: repo layout again)

2006-02-23 Thread Mark Hindess
On 22/02/06, Andrey Chernyshev [EMAIL PROTECTED] wrote:

 On 2/23/06, Matt Benson [EMAIL PROTECTED] wrote:
 
  Are these just sample names? Could there be
  shared/foo_linux.c
  whatever/bar_linux.c
  foo_ia32/bar.c
  bar_linux/baz.c
  baz_linux_ia32/more.c

 Yes, they could. The pattern for identifying architecture or OS
 dependence for a file is like [\W_]${attr}[\W_] where ${attr} stands
 for either specific OS or architecture.

  If so, will a directory always have no more than one
  leading name, i.e. not OS or architecture?

 I'd suggest that file is considered platform dependent if it contains
 any of magic platform keywords (like ia32, linux, e.t.c.) in it's
 full name. Directory name may or may not contain a leading name. For
 example, file */linux/*.c should be considered as linux specific as
 well. Another example, file */*_linux_solaris_*/*.c is considered as
 shared between linux and solaris, but not applicable for win, e.t.c.

I have a few concerns about this plan.

First, that we'll end up renaming relatively simple file names like
foo_linux_solaris.c to unmanageable things like
foo_linux_solaris_aix_plan9_freebsd_osx_openbsd_ecos.c.

Second, if we define everything in terms of high level concepts, such
as os and arch, then we will be losing the important information about
the real distinction.  This would make it harder for new developers to
understand the choices and reasoning embodied in the code.  To avoid
this, we should really be defining (and using) the actual concept that
is important in making the distinction about which code to pick up.
For example, defining properties for concepts like os provides a
getcwd function, os provides a clock_gettime function, processor
provides a fast implementation of a quuxo operation, etc.

Third, fixing my second problem means we end up implementing autoconf
(and associate tools) in ant.  We might also end up having to
implement new versions of code management tools so that people can
find the right bit of code to edit.

Fourth, if we don't use conventional tools and appropriate, meaningful
code layout, then we will be raising the bar significantly for new
developers considering contributing to the project.

Regards,
 Mark.

--
Mark Hindess [EMAIL PROTECTED]
IBM Java Technology Centre, UK.


Re: Platform dependent code placement (was: Re: repo layout again)

2006-02-22 Thread Andrey Chernyshev
On 2/21/06, Matt Benson [EMAIL PROTECTED] wrote:
 I have tried to reconstruct the gist of this
 discussion from the archives (wasn't paying enough
 attention the first time through), without much luck.
 :)  Since the discussion has evolved this far, I
 wonder if anyone could restate the Ant-specific part
 of the problem in concise terms, with the example
 directory structure and desired selection... ?  in
 case I might tersify the expression at all, I'd like
 to help Harmony in this small way as I've not yet
 found time to do more...

Hi Matt,

Thanks for your attention to this.
I'd like to have a selector in Ant FileSet, which would select file
names based on a regular expression. The regexp needs to be matched
with the string which consists of a path relative to the base dir of a
fileset, plus file name.

For example, suppose we have a set of files like this:
shared\test_linux_ia32.c
shared\test_shared.c
shared\test_win.c
shared\test_win_ia32.c
test_ia32\test1.c
test_linux\test2.c
test_win_ia32\test4.c

Then, for linux/ia32 configuration the selector should take:

shared\test_linux_ia32.c
shared\test_shared.c
test_ia32\test1.c
test_linux\test2.c

Ideally, I'd wish to do that with a code something like this:

  fileset dir=. includes=**/*.c
and
or
filenameregex expression=[\W_]${env.OS}[\W_]/
not
filenameregex
expression=[\W_](win|linux|solaris)[\W_]/
/not
/or
or
filenameregex expression=[\W_]${env.ARCH}[\W_]/
not
filenameregex
expression=[\W_](ia32|sparc|ipf)[\W_]/
/not
/or
/and
  /fileset
/cc

The above logic exactly describes the layout of a platform dependent
code that I suggested for Harmony.

I've tried to use standard filename and containsregex selectors,
but they didn't appear suitable for that purpose.

Thank you,
Andrey Chernyshev
Intel Middleware Products Division


 -Matt

 --- Andrey Chernyshev [EMAIL PROTECTED]
 wrote:
 (a bunch of stuff I snipped ;)

 __
 Do You Yahoo!?
 Tired of spam?  Yahoo! Mail has the best spam protection around
 http://mail.yahoo.com



Re: Platform dependent code placement (was: Re: repo layout again)

2006-02-22 Thread Matt Benson
--- Andrey Chernyshev [EMAIL PROTECTED]
wrote:
 On 2/21/06, Matt Benson [EMAIL PROTECTED]
 wrote:
[SNIP]
  wonder if anyone could restate the Ant-specific
 part
  of the problem in concise terms, with the example
  directory structure and desired selection... ?  in
  case I might tersify the expression at all, I'd
 like
  to help Harmony in this small way as I've not yet
  found time to do more...
 
 Hi Matt,
 
 Thanks for your attention to this.
 I'd like to have a selector in Ant FileSet, which
 would select file
 names based on a regular expression. The regexp
 needs to be matched
 with the string which consists of a path relative to
 the base dir of a
 fileset, plus file name.
 
 For example, suppose we have a set of files like
 this:
 shared\test_linux_ia32.c
 shared\test_shared.c
 shared\test_win.c
 shared\test_win_ia32.c
 test_ia32\test1.c
 test_linux\test2.c
 test_win_ia32\test4.c
 
 Then, for linux/ia32 configuration the selector
 should take:
 
 shared\test_linux_ia32.c
 shared\test_shared.c
 test_ia32\test1.c
 test_linux\test2.c
 

Are these just sample names? Could there be
shared/foo_linux.c
whatever/bar_linux.c
foo_ia32/bar.c
bar_linux/baz.c
baz_linux_ia32/more.c

If so, will a directory always have no more than one
leading name, i.e. not OS or architecture?

Thanks,
Matt

 Ideally, I'd wish to do that with a code something
 like this:
 
   fileset dir=. includes=**/*.c
 and
 or
 filenameregex
 expression=[\W_]${env.OS}[\W_]/
 not
 filenameregex
 expression=[\W_](win|linux|solaris)[\W_]/
 /not
 /or
 or
 filenameregex
 expression=[\W_]${env.ARCH}[\W_]/
 not
 filenameregex
 expression=[\W_](ia32|sparc|ipf)[\W_]/
 /not
 /or
 /and
   /fileset
 /cc
 
 The above logic exactly describes the layout of a
 platform dependent
 code that I suggested for Harmony.
 
 I've tried to use standard filename and
 containsregex selectors,
 but they didn't appear suitable for that purpose.
 
 Thank you,
 Andrey Chernyshev
 Intel Middleware Products Division
 
 
  -Matt
 
  --- Andrey Chernyshev [EMAIL PROTECTED]
  wrote:
  (a bunch of stuff I snipped ;)
 
  __
  Do You Yahoo!?
  Tired of spam?  Yahoo! Mail has the best spam
 protection around
  http://mail.yahoo.com
 
 


__
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 


Re: Platform dependent code placement (was: Re: repo layout again)

2006-02-22 Thread Andrey Chernyshev
On 2/23/06, Matt Benson [EMAIL PROTECTED] wrote:
 Are these just sample names? Could there be
 shared/foo_linux.c
 whatever/bar_linux.c
 foo_ia32/bar.c
 bar_linux/baz.c
 baz_linux_ia32/more.c

Yes, they could. The pattern for identifying architecture or OS
dependence for a file is like [\W_]${attr}[\W_] where ${attr} stands
for either specific OS or architecture.


 If so, will a directory always have no more than one
 leading name, i.e. not OS or architecture?

I'd suggest that file is considered platform dependent if it contains
any of magic platform keywords (like ia32, linux, e.t.c.) in it's
full name. Directory name may or may not contain a leading name. For
example, file */linux/*.c should be considered as linux specific as
well. Another example, file */*_linux_solaris_*/*.c is considered as
shared between linux and solaris, but not applicable for win, e.t.c.

Thank you,
Andrey Chernyshev
Intel Middleware Products Division


 Thanks,
 Matt

  Ideally, I'd wish to do that with a code something
  like this:
 
fileset dir=. includes=**/*.c
  and
  or
  filenameregex
  expression=[\W_]${env.OS}[\W_]/
  not
  filenameregex
  expression=[\W_](win|linux|solaris)[\W_]/
  /not
  /or
  or
  filenameregex
  expression=[\W_]${env.ARCH}[\W_]/
  not
  filenameregex
  expression=[\W_](ia32|sparc|ipf)[\W_]/
  /not
  /or
  /and
/fileset
  /cc
 
  The above logic exactly describes the layout of a
  platform dependent
  code that I suggested for Harmony.
 
  I've tried to use standard filename and
  containsregex selectors,
  but they didn't appear suitable for that purpose.
 
  Thank you,
  Andrey Chernyshev
  Intel Middleware Products Division
 
  
   -Matt
  
   --- Andrey Chernyshev [EMAIL PROTECTED]
   wrote:
   (a bunch of stuff I snipped ;)
  
   __
   Do You Yahoo!?
   Tired of spam?  Yahoo! Mail has the best spam
  protection around
   http://mail.yahoo.com
  
 


 __
 Do You Yahoo!?
 Tired of spam?  Yahoo! Mail has the best spam protection around
 http://mail.yahoo.com



Re: Platform dependent code placement (was: Re: repo layout again)

2006-02-22 Thread Matt Benson
--- Andrey Chernyshev [EMAIL PROTECTED]
wrote:

 On 2/23/06, Matt Benson [EMAIL PROTECTED]
 wrote:
  Are these just sample names? Could there be
  shared/foo_linux.c
  whatever/bar_linux.c
  foo_ia32/bar.c
  bar_linux/baz.c
  baz_linux_ia32/more.c
 
 Yes, they could. The pattern for identifying
 architecture or OS
 dependence for a file is like [\W_]${attr}[\W_]
 where ${attr} stands
 for either specific OS or architecture.
 
 
  If so, will a directory always have no more than
 one
  leading name, i.e. not OS or architecture?
 
 I'd suggest that file is considered platform
 dependent if it contains
 any of magic platform keywords (like ia32, linux,
 e.t.c.) in it's
 full name. Directory name may or may not contain a
 leading name. For
 example, file */linux/*.c should be considered as
 linux specific as
 well. Another example, file */*_linux_solaris_*/*.c
 is considered as
 shared between linux and solaris, but not applicable
 for win, e.t.c.

Ah... I hadn't extrapolated the linux_solaris
possibility.  The reason I asked my last
question--i.e. will there always be foo_os, foo_arch,
foo_os_arch as opposed to foo_bar_os, foo_bar_arch,
foo_bar_os_arch--is to learn more about how to
differentiate between foo_ia32 and foo_win_ia32.  The
reason being that the combination of linux/ia32 can't
just blindly include any file/dir with ia32 in the
name or it could pick up e.g. foo_win_ia32... can you
confirm there would be no reason for
foo_bar_(os_arch|os|arch)?

-Matt

 
 Thank you,
 Andrey Chernyshev
 Intel Middleware Products Division
 
 
  Thanks,
  Matt
 
   Ideally, I'd wish to do that with a code
 something
   like this:
  
 fileset dir=. includes=**/*.c
   and
   or
   filenameregex
   expression=[\W_]${env.OS}[\W_]/
   not
   filenameregex
   expression=[\W_](win|linux|solaris)[\W_]/
   /not
   /or
   or
   filenameregex
   expression=[\W_]${env.ARCH}[\W_]/
   not
   filenameregex
   expression=[\W_](ia32|sparc|ipf)[\W_]/
   /not
   /or
   /and
 /fileset
   /cc
  
   The above logic exactly describes the layout of
 a
   platform dependent
   code that I suggested for Harmony.
  
   I've tried to use standard filename and
   containsregex selectors,
   but they didn't appear suitable for that
 purpose.
  
   Thank you,
   Andrey Chernyshev
   Intel Middleware Products Division
  
   
-Matt
   
--- Andrey Chernyshev
 [EMAIL PROTECTED]
wrote:
(a bunch of stuff I snipped ;)
   
   
 __
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam
   protection around
http://mail.yahoo.com
   
  
 
 
  __
  Do You Yahoo!?
  Tired of spam?  Yahoo! Mail has the best spam
 protection around
  http://mail.yahoo.com
 
 


__
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 


Re: Platform dependent code placement (was: Re: repo layout again)

2006-02-21 Thread Nikolay Kuznetsov
Hello, team,

I've tried to simplify construction below, which is sample of Andrey's
ant script, and end up with following regular expression which matches
string containing particular OS identifier or strings w/o any OS
identifiers:

.*([_\\W]${env.OS}[_\\W].*$|.*(?!.*[\\W_](win|lin|sol)[\\W_].*)$)

This would work fine with regex package from HARMONY-39 contribution,
but fail to compile with SUN's classes (PatternSyntaxException:
Look-behind group does not have an obvious maximum length, I would
appreciate if someone point me to the place in any regex
specification, stating that it's valid behavior).

From the compatibility point of view this enhancement is no good, but
to give a hint how to implement negative assertions in terms of regex
negative look behind/ahead is the solution.

propertyregex property=OS.match input=@{file}
 regexp=[\W_]${env.OS}[\W_] override=yes defaultValue=no
 select=yes/
propertyregex property=OS.any.match input=@{file}
 regexp=[\W_](win|linux|solaris)[\W_] override=yes
 defaultValue=no select=yes/

istrue value=${OS.match}/
not
istrue value=${OS.any.match}/
/not

Thank you.

Nikolay Kuznetsov
Intel Middleware Products Division


Re: Platform dependent code placement (was: Re: repo layout again)

2006-02-20 Thread Andrey Chernyshev
On 2/17/06, Tim Ellison [EMAIL PROTECTED] wrote:
 do you have any examples (i.e. snippets of a non-trivial Ant script)
 that show what it would end up like?  I'm trying to figure out in my own
 head whether it would be a few general regex selectors, or a load of
 them!  I think you may be do it with just a few, right?

Actually not as few as I would wish. The selectors themselves are 4
lines, plus ~15 lines of logic ops to combine them. However, the rest
of code which does necessary conversions is big. Finally, the code
snippet that worked for me is like this:

!-- Translate fileset into plain string first --
pathconvert property=source.files pathsep=,
path
fileset dir=${basedir} includes=**/*.c/
/path
map from=${basedir} to=/
/pathconvert

!-- Filter plain string using regex --
for list=${source.files} param=file
sequential
propertyregex property=OS.match input=@{file}
regexp=[\W_]${env.OS}[\W_] override=yes defaultValue=no
select=yes/
propertyregex property=OS.any.match input=@{file}
regexp=[\W_](win|linux|solaris)[\W_] override=yes
defaultValue=no select=yes/
propertyregex property=ARCH.match input=@{file}
regexp=[\W_]${env.ARCH}[\W_] override=yes  defaultValue=no
select=yes/
propertyregex property=ARCH.any.match input=@{file}
regexp=[\W_](ia32|sparc|ipf)[\W_] override=yes  defaultValue=no
select=yes/
if
and
or
istrue value=${OS.match}/
not
istrue value=${OS.any.match}/
/not
/or
or
istrue value=${ARCH.match}/
not
istrue value=${ARCH.any.match}/
/not
/or
/and
 then
   var name=filetered.files value=${filetered.files},@{file}/
 /then
/if
/sequential
/for

!-- Convert string back to fileset --
path id=filetered.files.path
filelist dir=. files=${filetered.files}/
/path
pathtofileset name=filetered.files.fileset
   pathrefid=filetered.files.path
   dir=${basedir}/

!-- Call C compile task with the resulting fileset --
cc objdir=out
  fileset refid=filetered.files.fileset/
/cc
/target

It doesn't look too small because the original ant fileset regexp
selectors don't work with the directory names, hence one has to apply
some magic (convert fileset to string, filter it using regexp, and
then convert back to files). I imagine that the above code should look
much simpler if write a custom regexp selector class.

The reason why it doesn't look too simple is that we allow an
arbitrary delimiter like [\W_] for OS/ARCH attributes, and we catch
partially shared file names like linux_solaris.

I didn't try yet to implement the same logic with make, would it be
much simpler?

Thanks,
Andrey.


 Regards,
 Tim

  Using the names consistently will definitely help, but choosing whether
  to create a separate copy of the file in a platform-specific
  sub-directory, or to use #define's within a file in a shared-family
  sub-directory will likely come down to a case by case decision.  For
  example, 32-bit vs. 64-bit code may be conveniently #ifdef'ed in some .c
  files, but a .h file that defines pointer types etc. may need different
  versions of the entire file to keep things readable.
 
  Yes, I agree. This is why I suggest to keep both selection mechanisms
  - sometimes #define is more efficient, and sometimes dir/filename is
  much more clear.
 
  Finally, I'd suggest that the platform dependent code can be organized
  in 3 different ways:
 
  (1) Explicitly, via defining the appropriate file list. For example,
  Ant xml file may choose either one or another fileset, depending on
  the current OS and ARCH property values. This approach is most
  convenient, for example,  whenever a third-party code is compiled or
  the file names could not be changed for some reason.
  Ant ?!  ;-)  or platform-specific makefile #includes?
 
  Let's consider both for now :)
 
  There will be files that it makes sense to share for sure (like vmi.h
  and jni.h etc.) but they should be stable-API types that can be
  refreshed across the boundary as required if necessary.
 
  Agreed. I think it would be great if  we can keep our inter-component
  interfaces (like vmi.h) platform independent.
 
 
  Thank you,
  Andrey Chernyshev
  Intel Middleware Products Division
 
 
  Hence, the most efficient (in terms of code
  sharing and readability) code placement would require a maximum
  flexibility, though preserving some well-defined rules. The scheme
  based on file dir/name matching seems to be flexible enough.
 
  How does the above proposal sound?
  Cool, perhaps 

Re: Platform dependent code placement (was: Re: repo layout again)

2006-02-17 Thread Oliver Deakin

Tim Ellison wrote:

Andrey Chernyshev wrote:
  
  

snip

On the other hand, having a separate source trees like linux32.sparc,
solaris64.sparc, win.IA32 for each specific platform combination may
lead to a huge code duplication. We may need to be able to share the
code through the certain, but not through all platform combinations.



Agreed.  The existing code layout for the classlib natives is certainly
not a viable way to scale across multiple platforms.

(The 'in-house' mechanism for managing multi-platform code is particular
to IBM so not of great interest here, suffice to say that the win.IA32
and linux.IA32 trees in classlib/trunk/native-code are the product of
that mechanism with some manual tidy-up).

  
Also agree. The current layout will not scale well when we move to a 
broader range of platforms.



To address that issue, I can suggest a pretty straightforward scheme
for platform-dependent code placement which looks as follows:

1. There is a fixed set of attributes which denotes a specific target
configuration. As a starter set, we may have OS (for operating system)
and, say ARCH (for architecture) attributes. This set can be extended
later, but, as it was suggested, let's don't cross that bridge if we
come to it.



Yes, the principal distinction is probably on OS  ARCH.

  

2. Files in the source tree are selected for compilation based on the
OS or ARCH attribute values which may (or may not appear) in a file or
directory name.
Some examples are:

src\main\native\solaris\foo.cpp
- means file is applicable for whatever system running Solaris;



yep (that was foo.c, right ;-) -- only teasing)

  

src\main\native\win\foo_ia32.cpp
- file is applicable only for  Windows / IA32;



why has the ARCH flipped onto the file name?  why not win_ia32 ?

  

src\main\native\foo_ia32_em64t.cpp
- file can be compiled for whatever OS on either IA32 or EM64T
architecture, but nothing else.



I agree with the approach, but left wondering why it is not something like:
   src\main\native\
   common\
   unix\
   windows\
   zos\
   solaris\
   solaris_x86\
   solaris_sparc\
   windows_ifp\

i.e. a taxonomy covering families of code (common, unix-like,
windows-like) and increasingly specific discriminators.
  
The idea is good, however I think including both the OS and arch in the 
directory name is preferable.
It is just as simple a convention, gives the coder an at-a-glance view 
of which OS/arch's have platform specific code associated with them

and keeps the actual source filenames consistent across platforms.

Was there a particular reason for attaching the architecture to the 
filename and not the directory Andrey?
  

The formal file selection rule may look like:

(1) File is applicable for the given OS value if its pathname contains regexp
[\W_]${OS}[\W_], or pathname doesn't contain any OS value;

(2) File is applicable for the given ARCH value if its pathname contains regexp
[\W_]${ARCH}[\W_], or pathname doesn't contain any ARCH value;

(3) File is selected for a compilation if it satisfies both (1) and
(2) criteria.



If we restrict the OS and ARCH identifiers to directories then it will
allow us to use the gmake VPATH functionality to select the right file,
so compiling on solaris x86 will have a
VPATH='solaris_x86:solaris:unix:common' and so on.
  
I agree that is a perfect scenario to use VPATH for. I think this would 
probably be a simpler solution
than using ant (as suggested later) and also would not require you to 
have a JVM to build the native code.


  

One can see that this naming convention gives developers enough
freedom to layout their code in a most convenient way (actually,
experience shows that the meaning of convenient may differ
significantly depending on a component type :). On the other hand, it
gives well defined (and hopefully intuitive enough) rule showing
whether the particular file is picked up by the compiler or not,
depending on a configuration.



I like the idea -- if we agree to use gmake throughout then I think we
get this functionality 'for free'.

  

In addition to the above, the code could also be selected for
compilation by means of #defines directives in C/C++ files (it is
convenient when the most of a file is platform-independent, with the
exception of just a few lines of code). The building system could set
up the OS and ARCH attributes as appropriate defines for the C/C++
code.
For example, for Windows/IA32 config, the following defines could be set:

 #define OS WIN
 #define WIN
 #define ARCH IA32
 #define IA32

Then the platform-dependent code sections may look like:

#ifdef WIN
….
#endif

which is essentially same as:

#if OS == WIN
….
#endif

It is important that OS/ARCH (or whatever additional) attribute names
and values are used consistently in the file names and 

Re: Platform dependent code placement (was: Re: repo layout again)

2006-02-17 Thread Andrey Chernyshev
  src\main\native\win\foo_ia32.cpp
  - file is applicable only for  Windows / IA32;

 why has the ARCH flipped onto the file name?  why not win_ia32 ?

Well, let's see - if I have a file which is shared between Windows and
Linux, but it is IA32 specific, then I'll have to duplicate it in
win_ia32 and linux_ia32 dirs. It means having both ARCH and OS in a
file name isn't always convenient.

Another case, if I have only one file in my component which is
IA32-specific, it could be more convenient just to rename it like
foo.c - foo_ia32.c and keep it at the same location, rather than move
to some other directory. One sort of problems coming here is that
every additional directory may need to be registered appropriately as
a source search path in development / debugging tools (you'd face this
if you try to debug with MSVC, for example). I just thought that
giving a freedom to choose between naming files or directories will
help people to work in the most convenient way.


  src\main\native\foo_ia32_em64t.cpp
  - file can be compiled for whatever OS on either IA32 or EM64T
  architecture, but nothing else.

 I agree with the approach, but left wondering why it is not something like:
   src\main\native\
   common\
   unix\
   windows\
   zos\
   solaris\
   solaris_x86\
   solaris_sparc\
   windows_ifp\

 i.e. a taxonomy covering families of code (common, unix-like,
 windows-like) and increasingly specific discriminators.

Well, this directory structure does fit to the scheme I proposed, it
is a particular case of it.  Some people probably will want also to
play with the file names within a single directory in the same style: 
foo_solaris.c, foo_solaris_sparc.c, ...
I guess if  a component contains only 3 platform dependent c files,
someone would be frustrated to create 3 different directories for
them.


  The formal file selection rule may look like:
 
  (1) File is applicable for the given OS value if its pathname contains 
  regexp
  [\W_]${OS}[\W_], or pathname doesn't contain any OS value;
 
  (2) File is applicable for the given ARCH value if its pathname contains 
  regexp
  [\W_]${ARCH}[\W_], or pathname doesn't contain any ARCH value;
 
  (3) File is selected for a compilation if it satisfies both (1) and
  (2) criteria.

 If we restrict the OS and ARCH identifiers to directories then it will
 allow us to use the gmake VPATH functionality to select the right file,
 so compiling on solaris x86 will have a
 VPATH='solaris_x86:solaris:unix:common' and so on.

I see. Possibly vpath (small letters) could address the filenames?
Perhaps something like this should work:
vpath %solaris%.c
vpath %.c  soalris:unix:common

 I like the idea -- if we agree to use gmake throughout then I think we
 get this functionality 'for free'.

I guess the same could be done relatively easy for Ant as well, with
help of filesets and containsregex selectors.

 Using the names consistently will definitely help, but choosing whether
 to create a separate copy of the file in a platform-specific
 sub-directory, or to use #define's within a file in a shared-family
 sub-directory will likely come down to a case by case decision.  For
 example, 32-bit vs. 64-bit code may be conveniently #ifdef'ed in some .c
 files, but a .h file that defines pointer types etc. may need different
 versions of the entire file to keep things readable.

Yes, I agree. This is why I suggest to keep both selection mechanisms
- sometimes #define is more efficient, and sometimes dir/filename is
much more clear.


  Finally, I'd suggest that the platform dependent code can be organized
  in 3 different ways:
 
  (1) Explicitly, via defining the appropriate file list. For example,
  Ant xml file may choose either one or another fileset, depending on
  the current OS and ARCH property values. This approach is most
  convenient, for example,  whenever a third-party code is compiled or
  the file names could not be changed for some reason.

 Ant ?!  ;-)  or platform-specific makefile #includes?

Let's consider both for now :)


 There will be files that it makes sense to share for sure (like vmi.h
 and jni.h etc.) but they should be stable-API types that can be
 refreshed across the boundary as required if necessary.

Agreed. I think it would be great if  we can keep our inter-component
interfaces (like vmi.h) platform independent.


Thank you,
Andrey Chernyshev
Intel Middleware Products Division



  Hence, the most efficient (in terms of code
  sharing and readability) code placement would require a maximum
  flexibility, though preserving some well-defined rules. The scheme
  based on file dir/name matching seems to be flexible enough.
 
  How does the above proposal sound?

 Cool, perhaps we can discuss if it should be gmake + vpath or ant.

 Thanks for resurrecting this thread.

 Regards,
 Tim


  Maybe in some components we 

Re: Platform dependent code placement (was: Re: repo layout again)

2006-02-17 Thread Andrey Chernyshev
 The idea is good, however I think including both the OS and arch in the
 directory name is preferable.
 It is just as simple a convention, gives the coder an at-a-glance view
 of which OS/arch's have platform specific code associated with them
 and keeps the actual source filenames consistent across platforms.

 Was there a particular reason for attaching the architecture to the
 filename and not the directory Andrey?

I think there is no particular reason except just a convenience. 
People may not wish to create extra directory and go there each time
because of a few platform-dependent files. Also, as I mentioned in my
previous message, every extra source directory may additionally
complicate the setup of debugging tools and IDE's.

 I agree that is a perfect scenario to use VPATH for. I think this would
 probably be a simpler solution
 than using ant (as suggested later) and also would not require you to
 have a JVM to build the native code.

We have a lots of Java code in a source base, right?:) Therefore one
will need a JVM anyways to be able to build something runnable.

 This is a tricky one. I think in most cases the difference between
 32/64bit code should be minor and
 mostly confined to header defines as Tim suggests. For this ifdef's will

Let's don't forget about the different OS'es and architectures. For
example, I'd expect significant difference in implementations for AWT
on Win/X11 and JIT compiler on IA32/Sparc/IPF. The whole design of the
code could be different, not just implementations of certain functions
or classes. Using #defines only could be almost nightmare in this
case...

 be sufficient. I would simply suggest
 that we adopt a policy of always marking all #else and #endif's clearly
 to indicate which condition
 they relate to.
 However, there may be instances where using ifdef's obfuscates the code.
 I think most of the time this
 will be a judgement call on the part of the coder - if you look at a
 piece of code and cannot tell what
 the preprocessor is going to give you on a particular platform, you're
 probably looking at a candidate
 for code separation.

I agree, this seems to be a good criteria for choosing between defines
and dir/file names.


Thank you,
Andrey Chernyshev
Intel Middleware Products Division


 
  Finally, I'd suggest that the platform dependent code can be organized
  in 3 different ways:
 
  (1) Explicitly, via defining the appropriate file list. For example,
  Ant xml file may choose either one or another fileset, depending on
  the current OS and ARCH property values. This approach is most
  convenient, for example,  whenever a third-party code is compiled or
  the file names could not be changed for some reason.
 
 
  Ant ?!  ;-)  or platform-specific makefile #includes?
 
 
  (2) Via the file path naming convention. This is the preferred
  approach and works well whenever distinctive files for different
  platforms can be identified.
 
 
  yep (modulo discussion of filenames vs. dir names to enable vpath)
 
 
  (3) By means of the preprocessor directives. This could be convenient
  if only few lines of code need to vary across the platforms. However,
  preprocessor directives would make the code less readable, hence this
  should be used with care.
 
  In terms of building process, it means that the code has to pass all 3
  stages of filtering before it is selected for the compilation.
 
 
  I like it.  Let's just discuss what tools do the selection -- but I
  agree with the approach.
 
 
  The point is that components at Harmony could be very different,
  especially if we take into account that they may belong both to Class
  Libraries and VM world.
 
 
  There will be files that it makes sense to share for sure (like vmi.h
  and jni.h etc.) but they should be stable-API types that can be
  refreshed across the boundary as required if necessary.
 
 
  Hence, the most efficient (in terms of code
  sharing and readability) code placement would require a maximum
  flexibility, though preserving some well-defined rules. The scheme
  based on file dir/name matching seems to be flexible enough.
 
  How does the above proposal sound?
 
 
 
 Sounds good :) It makes a lot of sense to organise the code in a way
 that promotes reuse across platforms.
 +1 from me


 --
 Oliver Deakin
 IBM United Kingdom Limited



  Cool, perhaps we can discuss if it should be gmake + vpath or ant.
 
  Thanks for resurrecting this thread.
 
  Regards,
  Tim
 
 
 
  Maybe in some components we would want to include a window manager
  family too, though let's cross that bridge...
 
  I had a quick hunt round for a recognized standard or convention for OS
  and CPU family names, but it seems there are enough subtle differences
  around that we should just define them for ourselves.
 
 
  My VM's config script maintains CPU type, OS name, and word size as three
  independent values.  These are combined in various ways in the source code
  and support scripts depending on the 

Re: Platform dependent code placement (was: Re: repo layout again)

2006-02-17 Thread Tim Ellison
Andrey Chernyshev wrote:
 src\main\native\win\foo_ia32.cpp
 - file is applicable only for  Windows / IA32;
 why has the ARCH flipped onto the file name?  why not win_ia32 ?
 
 Well, let's see - if I have a file which is shared between Windows and
 Linux, but it is IA32 specific, then I'll have to duplicate it in
 win_ia32 and linux_ia32 dirs. It means having both ARCH and OS in a
 file name isn't always convenient.
 
 Another case, if I have only one file in my component which is
 IA32-specific, it could be more convenient just to rename it like
 foo.c - foo_ia32.c and keep it at the same location, rather than move
 to some other directory. One sort of problems coming here is that
 every additional directory may need to be registered appropriately as
 a source search path in development / debugging tools (you'd face this
 if you try to debug with MSVC, for example). I just thought that
 giving a freedom to choose between naming files or directories will
 help people to work in the most convenient way.
 
 src\main\native\foo_ia32_em64t.cpp
 - file can be compiled for whatever OS on either IA32 or EM64T
 architecture, but nothing else.
 I agree with the approach, but left wondering why it is not something like:
   src\main\native\
   common\
   unix\
   windows\
   zos\
   solaris\
   solaris_x86\
   solaris_sparc\
   windows_ifp\

 i.e. a taxonomy covering families of code (common, unix-like,
 windows-like) and increasingly specific discriminators.
 
 Well, this directory structure does fit to the scheme I proposed, it
 is a particular case of it.  Some people probably will want also to
 play with the file names within a single directory in the same style: 
 foo_solaris.c, foo_solaris_sparc.c, ...

Ah, I see.  I hadn't appreciated that you can mix-n-match the dir names
and file names encoding.

 I guess if  a component contains only 3 platform dependent c files,
 someone would be frustrated to create 3 different directories for
 them.
 
 The formal file selection rule may look like:

 (1) File is applicable for the given OS value if its pathname contains 
 regexp
 [\W_]${OS}[\W_], or pathname doesn't contain any OS value;

 (2) File is applicable for the given ARCH value if its pathname contains 
 regexp
 [\W_]${ARCH}[\W_], or pathname doesn't contain any ARCH value;

 (3) File is selected for a compilation if it satisfies both (1) and
 (2) criteria.
 If we restrict the OS and ARCH identifiers to directories then it will
 allow us to use the gmake VPATH functionality to select the right file,
 so compiling on solaris x86 will have a
 VPATH='solaris_x86:solaris:unix:common' and so on.
 
 I see. Possibly vpath (small letters) could address the filenames?
 Perhaps something like this should work:
 vpath %solaris%.c
 vpath %.c  soalris:unix:common
 
 I like the idea -- if we agree to use gmake throughout then I think we
 get this functionality 'for free'.
 
 I guess the same could be done relatively easy for Ant as well, with
 help of filesets and containsregex selectors.

do you have any examples (i.e. snippets of a non-trivial Ant script)
that show what it would end up like?  I'm trying to figure out in my own
head whether it would be a few general regex selectors, or a load of
them!  I think you may be do it with just a few, right?

Regards,
Tim

 Using the names consistently will definitely help, but choosing whether
 to create a separate copy of the file in a platform-specific
 sub-directory, or to use #define's within a file in a shared-family
 sub-directory will likely come down to a case by case decision.  For
 example, 32-bit vs. 64-bit code may be conveniently #ifdef'ed in some .c
 files, but a .h file that defines pointer types etc. may need different
 versions of the entire file to keep things readable.
 
 Yes, I agree. This is why I suggest to keep both selection mechanisms
 - sometimes #define is more efficient, and sometimes dir/filename is
 much more clear.
 
 Finally, I'd suggest that the platform dependent code can be organized
 in 3 different ways:

 (1) Explicitly, via defining the appropriate file list. For example,
 Ant xml file may choose either one or another fileset, depending on
 the current OS and ARCH property values. This approach is most
 convenient, for example,  whenever a third-party code is compiled or
 the file names could not be changed for some reason.
 Ant ?!  ;-)  or platform-specific makefile #includes?
 
 Let's consider both for now :)
 
 There will be files that it makes sense to share for sure (like vmi.h
 and jni.h etc.) but they should be stable-API types that can be
 refreshed across the boundary as required if necessary.
 
 Agreed. I think it would be great if  we can keep our inter-component
 interfaces (like vmi.h) platform independent.
 
 
 Thank you,
 Andrey Chernyshev
 Intel Middleware Products Division
 
 
 Hence, the most